Re: Index Replication Failure

2020-10-20 Thread Parshant Kumar
Hi all, please check the details

On Sat, Oct 17, 2020 at 5:52 PM Parshant Kumar 
wrote:

>
>
> *Architecture is master->repeater->slave servers in hierarchy.*
>
> *One of the Below exceptions are occuring whenever replication fails.*
>
> 1)WARN : Error in fetching file: _4rnu_t.liv (downloaded 0 of 11505507
> bytes)
> java.io.EOFException: Unexpected end of ZLIB input stream
> at
> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
> at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
> at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
> at
> org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
> at
> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:139)
> at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166)
> at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1443)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409)
>
> 2)
> WARN : Error getting file length for [segments_568]
> java.nio.file.NoSuchFileException:
> /data/solr/search/application/core-conf/im-search/data/index.20200711012319226/segments_568
> at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
> at
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
> at
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
> at java.nio.file.Files.readAttributes(Files.java:1737)
> at java.nio.file.Files.size(Files.java:2332)
> at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:615)
> at
> org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:588)
> at
> org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:335)
>
>
> 3)
> WARN : Error in fetching file: _4nji.nvd (downloaded 507510784 of
> 555377795 bytes)
> org.apache.http.MalformedChunkCodingException: CRLF expected at end of
> chunk
> at
> org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:255)
> at
> org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
> at
> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
> at
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
> at
> java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238)
> at
> java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
> at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
> at
> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:128)
> at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1458)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409)
> at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1390)
> at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:872)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:438)
> at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)
>
>
> *Replication configuration of master,repeater,slave's is given below:*
>
>  
> 
> ${enable.master:false}
> commit
> startup
> 00:00:10
> 
>
>
> *Commit Configuration master,repeater,slave's is given below :*
>
>  
> 10false
>
>
>
>
>
>
> On Sat, Oct 17, 2020 at 5:12 PM Erick Erickson 
> wrote:
>
>> None of your images made it through the mail server. You’ll
>> have to put them somewhere and provide a link.
>>
>> > On Oct 17, 2020, at 5:17 AM, Parshant Kumar <
>> parshant.ku...@indiamart.com.INVALID> wrote:
>> >
>> > Architecture image: If not visible in previous mail
>> >
>> >
>> >
>> >
>> > On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar <
>> parshant.ku...@indiamart.com> wrote:
>> > Hi all,
>> >
>> > We are having solr architecture as below.
>> >
>> >
>> >
>> > We are facing the frequent replication failure between master to
>> repeater server  as well as between repeater  to slave servers.
>> > On checking logs found every time one of the 

Re: Index Replication Failure

2020-10-17 Thread Parshant Kumar
*Architecture is master->repeater->slave servers in hierarchy.*

*One of the Below exceptions are occuring whenever replication fails.*

1)WARN : Error in fetching file: _4rnu_t.liv (downloaded 0 of 11505507
bytes)
java.io.EOFException: Unexpected end of ZLIB input stream
at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
at
org.apache.solr.common.util.FastInputStream.refill(FastInputStream.java:88)
at
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:139)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1443)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409)

2)
WARN : Error getting file length for [segments_568]
java.nio.file.NoSuchFileException:
/data/solr/search/application/core-conf/im-search/data/index.20200711012319226/segments_568
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
at
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:615)
at
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:588)
at
org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:335)


3)
WARN : Error in fetching file: _4nji.nvd (downloaded 507510784 of 555377795
bytes)
org.apache.http.MalformedChunkCodingException: CRLF expected at end of chunk
at
org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:255)
at
org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at
java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:238)
at
java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:79)
at
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:128)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:166)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1458)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1409)
at
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1390)
at
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:872)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:438)
at
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:254)


*Replication configuration of master,repeater,slave's is given below:*

 

${enable.master:false}
commit
startup
00:00:10



*Commit Configuration master,repeater,slave's is given below :*

 
10false






On Sat, Oct 17, 2020 at 5:12 PM Erick Erickson 
wrote:

> None of your images made it through the mail server. You’ll
> have to put them somewhere and provide a link.
>
> > On Oct 17, 2020, at 5:17 AM, Parshant Kumar <
> parshant.ku...@indiamart.com.INVALID> wrote:
> >
> > Architecture image: If not visible in previous mail
> >
> >
> >
> >
> > On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar <
> parshant.ku...@indiamart.com> wrote:
> > Hi all,
> >
> > We are having solr architecture as below.
> >
> >
> >
> > We are facing the frequent replication failure between master to
> repeater server  as well as between repeater  to slave servers.
> > On checking logs found every time one of the below  exceptions occurred
> whenever the replication have failed.
> >
> > 1)
> >
> > 2)
> >
> >
> > 3)
> >
> >
> > The replication configuration of master,repeater,slave's is given below:
> >
> >
> >
> > Commit Configuration master,repeater,slave's is given below :
> >
> >
> >
> > Replication between master and repeater 

Re: Index Replication Failure

2020-10-17 Thread Erick Erickson
None of your images made it through the mail server. You’ll
have to put them somewhere and provide a link.

> On Oct 17, 2020, at 5:17 AM, Parshant Kumar 
>  wrote:
> 
> Architecture image: If not visible in previous mail
> 
> 
> 
> 
> On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar  
> wrote:
> Hi all,
> 
> We are having solr architecture as below.
> 
> 
> 
> We are facing the frequent replication failure between master to repeater 
> server  as well as between repeater  to slave servers.
> On checking logs found every time one of the below  exceptions occurred 
> whenever the replication have failed. 
> 
> 1)
> 
> 2)
> 
> 
> 3)
> 
> 
> The replication configuration of master,repeater,slave's is given below:
> 
> 
> 
> Commit Configuration master,repeater,slave's is given below :
> 
> 
> 
> Replication between master and repeater occurs every 10 mins.
> Replication between repeater and slave servers occurs every 15 mins between 
> 4-7 am and after that in every 3 hours.
> 
> Thanks,
> Parshant Kumar
> 
> 
> 
> 
> 
> 
> 



Re: Index Replication Failure

2020-10-17 Thread Parshant Kumar
Architecture image: If not visible in previous mail

[image: image.png]


On Sat, Oct 17, 2020 at 2:38 PM Parshant Kumar 
wrote:

> Hi all,
>
> We are having solr architecture as below.
>
>
>
> *We are facing the frequent replication failure between master to repeater
> server  as well as between repeater  to slave servers.*
> On checking logs found every time one of the below  exceptions occurred
> whenever the replication have failed.
>
> 1)
> [image: image.png]
> 2)
> [image: image.png]
>
> 3)
> [image: image.png]
>
> The replication configuration of master,repeater,slave's is given below:
>
> [image: image.png]
>
> Commit Configuration master,repeater,slave's is given below :
>
> [image: image.png]
>
> Replication between master and repeater occurs every 10 mins.
> Replication between repeater and slave servers occurs every 15 mins
> between 4-7 am and after that in every 3 hours.
>
> Thanks,
> Parshant Kumar
>
>
>
>
>
>

-- 



Re: Index Deeply Nested documents and retrieve a full nested document in solr

2020-09-24 Thread Alexandre Rafalovitch
It is yes to both questions, but I am not sure if they play well
together for historical reasons.

For storing/parsing original JSON in any (custom) format:
https://lucene.apache.org/solr/guide/8_6/transforming-and-indexing-custom-json.html
(srcField parameter)
For indexing nested children (with named collections of subdocuments)
but in Solr's own JSON format:
https://lucene.apache.org/solr/guide/8_6/indexing-nested-documents.html

I am not sure if defining additional fields as per the second document
but indexing the first way will work together. A feedback on that
would be useful.

Please also note that Solr is not intended to be the primary storage
(like a database). If you do atomic operations, the stored JSON will
get out of sync as it is not regenerated. Also, for the advanced
searches, you may want to normalize your data in different ways than
those your original data structure has. So, you may want to consider
an architecture where that JSON is stored separately or is retrieved
from original database and the Solr is focused on good search and
returning you just the record ID. That would actually allow you to
store a lot less in Solr (like just IDs) and focus on indexing in the
best way. Not saying it is the right way for your needs, just that is
a non-obvious architecture choice you may want to keep in mind as you
add Solr to your existing stack.

Regards,
   Alex.

On Thu, 24 Sep 2020 at 10:23, Abhay Kumar  wrote:
>
> Hello Team,
>
> Can someone please help to index the below sample json document into Solr.
>
> I have following queries on indexing multi level child document.
>
>
>   1.  Can we specify names to documents hierarchy such as "therapeuticareas" 
> or "sites" while indexing.
>   2.  How can we index document at multi-level hierarchy.
>
> I have following queries on retrieving the result.
>
>
>   1.  How can I retrieve result with full nested structure.
>
> [{
>"id": "NCT0102",
>"title": "Congenital Adrenal Hyperplasia: Calcium Channels as 
> Therapeutic Targets",
>"phase": "Phase 1/Phase 2",
>"status": "Completed",
>"studytype": "Interventional",
>"enrollmenttype": "",
>"sponsorname": ["National Center for Research Resources 
> (NCRR)"],
>"sponsorrole": ["lead"],
>"score": [0],
>"source": "National Center for Research Resources (NCRR)",
>"therapeuticareas": [{
>  "taid": "ta1",
>  "ta": "Lung Cancer",
>  "diseaseAreas": ["Oncology, 
> Respiratory tract diseases"],
>  "pubmeds": [{
> "pmbid": "pm1",
> "articleTitle": 
> "Consensus minimum data set for lung cancer multidisciplinary teams Results 
> of a Delphi process",
> "revisedDate": 
> "2018-12-11T18:30:00Z"
>  }],
>  "conferences": [{
> "confid": "conf1",
> "conferencename": 
> "American Academy of Neurology Annual Meeting",
> 
> "conferencetopic": "Avances en el manejo de los trastornos del movimiento 
> hipercineticos",
> "conferencedate": 
> "2019-05-08T18:30:00Z"
>  }]
>   },
>   {
>  "taid": "ta2",
>  "ta": "Breast Cancer",
>  "diseaseAreas": ["Oncology"],
>  "pubmeds": [],
>  "conferences": []
>   }
>],
>
>"sites": [{
>   "siteid": "site1",
>   "type": "Hospital",
>   "institutionname": "Methodist Health System",
>   "country": "United States",
>   "state": "Texas",
>   "city": "Dallas",
>   "zip": ""
>}],
>
>"investigators": [{
>   "invid": "inv1",
>   "investigatorname": "Bryan A Faller",
>   "role": "Principal Investigator",
>   "location": "",
>  

Re: Index files on Windows fileshare

2020-06-25 Thread Fiz N
Thanks Jason. Appreciate your response.

Thanks
Fiz N.

On Thu, Jun 25, 2020 at 5:42 AM Jason Gerlowski 
wrote:

> Hi Fiz,
>
> Since you're just looking for a POC solution, I think Solr's
> "bin/post" tool would probably help you achieve your first
> requirement.
>
> But I don't think "bin/post" gives you much control over the fields
> that get indexed - if you need the file path to be stored, you might
> be better off writing a small crawler in Java and using SolrJ to do
> the indexing.
>
> Good luck!
>
> Jason
>
> On Fri, Jun 19, 2020 at 9:34 AM Fiz N  wrote:
> >
> > Hello Solr experts,
> >
> > I am using standalone version of SOLR 8.5 on Windows machine.
> >
> > 1)  I want to index all types of files under different directory in the
> > file share.
> >
> > 2) I need to index  absolute path of the files and store it solr field. I
> > need that info so that end user can click and open the file(Pop-up)
> >
> > Could you please tell me how to go about this?
> > This is for POC purpose once we finalize the solution we would be further
> > going ahead with stable approach.
> >
> > Thanks
> > Fiz Nadian.
>


Re: Index files on Windows fileshare

2020-06-25 Thread Jason Gerlowski
Hi Fiz,

Since you're just looking for a POC solution, I think Solr's
"bin/post" tool would probably help you achieve your first
requirement.

But I don't think "bin/post" gives you much control over the fields
that get indexed - if you need the file path to be stored, you might
be better off writing a small crawler in Java and using SolrJ to do
the indexing.

Good luck!

Jason

On Fri, Jun 19, 2020 at 9:34 AM Fiz N  wrote:
>
> Hello Solr experts,
>
> I am using standalone version of SOLR 8.5 on Windows machine.
>
> 1)  I want to index all types of files under different directory in the
> file share.
>
> 2) I need to index  absolute path of the files and store it solr field. I
> need that info so that end user can click and open the file(Pop-up)
>
> Could you please tell me how to go about this?
> This is for POC purpose once we finalize the solution we would be further
> going ahead with stable approach.
>
> Thanks
> Fiz Nadian.


Re: Index file on Windows fileshare..

2020-06-23 Thread Erick Erickson
The program I pointed you to should take about an hour to make work.

But otherwise, you can try the post tool:
https://lucene.apache.org/solr/guide/7_2/post-tool.html

Best,
Erick

> On Jun 23, 2020, at 8:45 AM, Fiz N  wrote:
> 
> Thanks Erick. Is there easy way of doing this? Index files from windows
> share folder to SOLR.
> This is for POC only.
> 
> Thanks
> Nadian.
> 
> On Mon, Jun 22, 2020 at 3:54 PM Erick Erickson 
> wrote:
> 
>> Consider running Tika in a client and indexing the docs to Solr.
>> At that point, you have total control over what’s indexed.
>> 
>> Here’s a skeletal program to get you started:
>> https://lucidworks.com/post/indexing-with-solrj/
>> 
>> Best,
>> Erick
>> 
>>> On Jun 22, 2020, at 1:21 PM, Fiz N  wrote:
>>> 
>>> Hello Solr experts,
>>> 
>>> I am using standalone version of SOLR 8.5 on Windows machine.
>>> 
>>> 1)  I want to index all types of files under different directory in the
>>> file share.
>>> 
>>> 2) I need to index  absolute path of the files and store it solr field. I
>>> need that info so that end user can click and open the file(Pop-up)
>>> 
>>> Could you please tell me how to go about this?
>>> This is for POC purpose once we finalize the solution we would be further
>>> going ahead with stable approach.
>>> 
>>> Thanks
>>> Fiz Nadian.
>> 
>> 



Re: Index file on Windows fileshare..

2020-06-23 Thread Fiz N
Thanks Erick. Is there easy way of doing this? Index files from windows
share folder to SOLR.
This is for POC only.

Thanks
Nadian.

On Mon, Jun 22, 2020 at 3:54 PM Erick Erickson 
wrote:

> Consider running Tika in a client and indexing the docs to Solr.
> At that point, you have total control over what’s indexed.
>
> Here’s a skeletal program to get you started:
> https://lucidworks.com/post/indexing-with-solrj/
>
> Best,
> Erick
>
> > On Jun 22, 2020, at 1:21 PM, Fiz N  wrote:
> >
> > Hello Solr experts,
> >
> > I am using standalone version of SOLR 8.5 on Windows machine.
> >
> > 1)  I want to index all types of files under different directory in the
> > file share.
> >
> > 2) I need to index  absolute path of the files and store it solr field. I
> > need that info so that end user can click and open the file(Pop-up)
> >
> > Could you please tell me how to go about this?
> > This is for POC purpose once we finalize the solution we would be further
> > going ahead with stable approach.
> >
> > Thanks
> > Fiz Nadian.
>
>


Re: Index file on Windows fileshare..

2020-06-22 Thread Erick Erickson
Consider running Tika in a client and indexing the docs to Solr. 
At that point, you have total control over what’s indexed.

Here’s a skeletal program to get you started:
https://lucidworks.com/post/indexing-with-solrj/

Best,
Erick

> On Jun 22, 2020, at 1:21 PM, Fiz N  wrote:
> 
> Hello Solr experts,
> 
> I am using standalone version of SOLR 8.5 on Windows machine.
> 
> 1)  I want to index all types of files under different directory in the
> file share.
> 
> 2) I need to index  absolute path of the files and store it solr field. I
> need that info so that end user can click and open the file(Pop-up)
> 
> Could you please tell me how to go about this?
> This is for POC purpose once we finalize the solution we would be further
> going ahead with stable approach.
> 
> Thanks
> Fiz Nadian.



Re: Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Florin Babes
Hello,
The patch is to fix the display. It doesn't configure or limit the speed :)


În mar., 16 iun. 2020 la 14:26, Shawn Heisey  a scris:

> On 6/14/2020 12:06 AM, Florin Babes wrote:
> > While checking ways to optimize the speed of replication I've noticed
> that
> > the index download speed is fixed at 5.1 in replication.html. There is a
> > reason for that? If not, I would like to submit a patch with the fix.
> > We are using solr 8.3.1.
>
> Looking at the replication.html file, the part that says "5.1 MB/s"
> appears to be purely display.  As far as I can tell, it's not
> configuring anything, and it's not gathering information from anywhere.
>
> So unless your solrconfig.xml is configuring a speed limit in the
> replication handler, I don't think there is one.
>
> I'm curious about exactly what you have in mind for a patch.
>
> Thanks,
> Shawn
>


Re: Index download speed while replicating is fixed at 5.1 in replication.html

2020-06-16 Thread Shawn Heisey

On 6/14/2020 12:06 AM, Florin Babes wrote:

While checking ways to optimize the speed of replication I've noticed that
the index download speed is fixed at 5.1 in replication.html. There is a
reason for that? If not, I would like to submit a patch with the fix.
We are using solr 8.3.1.


Looking at the replication.html file, the part that says "5.1 MB/s" 
appears to be purely display.  As far as I can tell, it's not 
configuring anything, and it's not gathering information from anywhere.


So unless your solrconfig.xml is configuring a speed limit in the 
replication handler, I don't think there is one.


I'm curious about exactly what you have in mind for a patch.

Thanks,
Shawn


Re: index join without query criteria

2020-06-08 Thread Mikhail Khludnev
or probably -director_id:[* TO *]

On Mon, Jun 8, 2020 at 10:56 PM Hari Iyer  wrote:

> Hi,
>
> It appears that a query criteria is mandatory for a join. Taking this
> example from the documentation: fq={!join from=id fromIndex=movie_directors
> to=director_id}has_oscar:true. What if I want to find all movies that have
> a director (regardless of whether they have won an Oscar or not)? This
> query: fq={!join from=id fromIndex=movie_directors to=director_id} fails.
> Do I just have to make up a dummy criteria like fq={!join from=id
> fromIndex=movie_directors to=director_id}id:[* TO *]?
>
> Thanks,
> Hari.
>
>

-- 
Sincerely yours
Mikhail Khludnev


Re: Index using CSV file

2020-04-18 Thread Jörn Franke
Please also do not forget that you should create a schema in the Solr 
collection so that the data is correctly indexed so that you get fast and 
correct query result. 
I usually recommend to read one of the many Solr books out there to get 
started. This will save you a lot of time. 

> Am 18.04.2020 um 17:43 schrieb Jörn Franke :
> 
> 
> This you don’t do via the Solr UI. You have many choices amongst others 
> 1) write a client yourself that parses the csv and post it to the standard 
> Update handler 
> https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html
> 2) use the Solr post tool 
> https://lucene.apache.org/solr/guide/8_4/post-tool.html
> 3) use a http client command line tool (eg curl) and post the data to the CSV 
> update handler: 
> https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html
> 
> However, it would be useful to know what you exactly trying to achieve and 
> give more background on the project, what programming languages and 
> frameworks you (plan to) use etc to give you a more guided answer 
> 
>>> Am 18.04.2020 um 17:13 schrieb Shravan Kumar Bolla 
>>> :
>>> 
>> Hi,
>> 
>> I'm trying to import data from CSV file from Solr UI and I am completely new 
>> to Solr. Please provide the necessary configurations to achieve this.
>> 
>> 


Re: Index using CSV file

2020-04-18 Thread Jörn Franke
This you don’t do via the Solr UI. You have many choices amongst others 
1) write a client yourself that parses the csv and post it to the standard 
Update handler 
https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html
2) use the Solr post tool 
https://lucene.apache.org/solr/guide/8_4/post-tool.html
3) use a http client command line tool (eg curl) and post the data to the CSV 
update handler: 
https://lucene.apache.org/solr/guide/8_4/uploading-data-with-index-handlers.html

However, it would be useful to know what you exactly trying to achieve and give 
more background on the project, what programming languages and frameworks you 
(plan to) use etc to give you a more guided answer 

> Am 18.04.2020 um 17:13 schrieb Shravan Kumar Bolla 
> :
> 
> Hi,
> 
> I'm trying to import data from CSV file from Solr UI and I am completely new 
> to Solr. Please provide the necessary configurations to achieve this.
> 
> 


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
Ok, i created collection from scratch based on config

Unfortunately, it does not improve. It is just growing and growing. Except
when I stop solr and then during startup the unnecessary index files are
purged. Even with the previous config this did not happen in older Solr
versions (for sure not in 8.2, in 8.3 maybe, but for sure in 8.4).

Reproduction is simple: just load documents into the index (even during the
first load i observe a significant index size increase (4x fold) that is
then reduced after restart).

I observe though that during metadata update (= atomic updates) it
increases double (not anywhere near what is expected due to the update) and
then slightly reduce (a few megabytes, nothing compared to the real full
size that the index now has).

At the moment, it looks to me it is due to the Solr version, because the
config did not change (we have them all versioned, I checked). However,
maybe I am overlooking something.

Furthermore, it seems that during segment merges old segments are not
deleted until restart (but again, it is a speculation).
I suspect not many have observed this, because the only way that would be
observe is 1) they index a collection completely new and see a huge index
file consumption 2) they update their collection a lot and hit a limit of
disk space (which may happen in some cases not so soon).

I created a JIRA: https://issues.apache.org/jira/browse/SOLR-14202

Please let me know if I can test anything else.

On Tue, Jan 21, 2020 at 10:58 PM Jörn Franke  wrote:

> After testing the update?commit=true i now face an error: "Maximum lock
> count exceeded". strange this is the first time i see this in the lockfiles
> and when doing commit=true
> ava.lang.Error: Maximum lock count exceeded
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:535)
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:494)
> at
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1368)
> at
> java.base/java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:882)
> at
> org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)
> at
> org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:124)
> at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:658)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:102)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1079)
> at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:220)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
> at
> org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
> at
> 

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
After testing the update?commit=true i now face an error: "Maximum lock
count exceeded". strange this is the first time i see this in the lockfiles
and when doing commit=true
ava.lang.Error: Maximum lock count exceeded
at
java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:535)
at
java.base/java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:494)
at
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1368)
at
java.base/java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:882)
at
org.apache.solr.update.DefaultSolrCoreState.lock(DefaultSolrCoreState.java:179)
at
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:124)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:658)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:102)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalCommit(DistributedUpdateProcessor.java:1079)
at
org.apache.solr.update.processor.DistributedZkUpdateProcessor.processCommit(DistributedZkUpdateProcessor.java:220)
at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:160)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.update.processor.UpdateRequestProcessor.processCommit(UpdateRequestProcessor.java:68)
at
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:62)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1711)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1347)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1678)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1249)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:152)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at 

Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
The only weird thing is I see that for instance I have
${solr.autoCommit.maxTime:15000}  and similar entries.
It looks like a template gone wrong, but this was not caused due to an
internal development. It must have been come from a Solr version.

On Tue, Jan 21, 2020 at 10:49 PM Jörn Franke  wrote:

> It is btw. a Linux system and autosoftcommit is set to -1. However, indeed
> openSearcher is set to false. A commit is set to true after doing all the
> updates, but the index is not shrinking. The files are not disappearing
> during shutdown, but they disappear after starting up again.
>
> On Tue, Jan 21, 2020 at 4:04 PM Jörn Franke  wrote:
>
>> thanks for the answer I will look into it - it is a possible explanation.
>>
>> > Am 20.01.2020 um 14:30 schrieb Erick Erickson > >:
>> >
>> > Jörn:
>> >
>> > The only thing I can think of that _might_ cause this (I’m not all that
>> familiar with the code) is if your solrconfig settings never open a
>> searcher. Either you need to be sure openSearcher is set to true in the
>> autocommit section in solrconfig.xml or your autoSoftCommit is set to
>> something other than -1. Real Time Get requires access to all segments and
>> it takes a new searcher being opened to release them. Actually, a very
>> quick test would be to submit 
>> “http://host:port/solr/collection/update?commit=true”
>> and see if the index shrinks as a result. You don’t need to change
>> solrconfig.xml for that test.
>> >
>> > If you are opening a new searcher, this is very concerning. There
>> shouldn’t be anything else you have to set to prevent the index from
>> growing. Could you check one thing? Compare the directory listing of the
>> data/index directory just before you shut down Solr and then just after.
>> What I’m  interested in is whether some subset of files disappears when you
>> shut down Solr. This assumes you’re running on a *nix system, if Windows
>> you may have to start Solr again to see the difference.
>> >
>> > So if you open a searcher and still see the problem, I can try to
>> reproduce it. Can you share your solrconfig file or at least the autocommit
>> and cache portions?
>> >
>> > Best,
>> > Erick
>> >
>> >> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
>> >>
>> >> From what is see it basically duplicates the index files, but does not
>> delete the old ones.
>> >> It uses caffeine cache.
>> >>
>> >> What I observe is that there is an exception when shutting down for
>> the collection that is updated - timeout waiting for all directory ref
>> counts to be released - gave up waiting on CacheDir.
>> >>
>>  Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>> >>>
>> >>> Sorry I missed a line - not tlog is growing but the /data/index
>> folder is growing - until restart when it seems to be purged.
>> >>>
>>  Am 20.01.2020 um 10:47 schrieb Jörn Franke :
>> 
>>  Hi,
>> 
>>  I have a test system here with Solr 8.4 (but this is also
>> reproducible in older Solr versions), which has an index which is growing
>> and growing - until the SolrCloud instance is restarted - then it is
>> reduced tot the expected normal size.
>>  The collection is configured to do auto commit after 15000 ms. I
>> expect the index grows comes due to the usage of atomic updates, but I
>> would expect that due to the auto commit this does not grow all the time.
>>  After the atomic updates a commit is done in any case.
>> 
>>  I don’t see any error message in the log files, but the growth is
>> quiet significant and frequent restarts are not a solution of course.
>> 
>>  Maybe I am overlooking here a tiny configuration issue?
>> 
>>  Thank you.
>> 
>> 
>>  Best regards
>> >
>>
>


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
It is btw. a Linux system and autosoftcommit is set to -1. However, indeed
openSearcher is set to false. A commit is set to true after doing all the
updates, but the index is not shrinking. The files are not disappearing
during shutdown, but they disappear after starting up again.

On Tue, Jan 21, 2020 at 4:04 PM Jörn Franke  wrote:

> thanks for the answer I will look into it - it is a possible explanation.
>
> > Am 20.01.2020 um 14:30 schrieb Erick Erickson :
> >
> > Jörn:
> >
> > The only thing I can think of that _might_ cause this (I’m not all that
> familiar with the code) is if your solrconfig settings never open a
> searcher. Either you need to be sure openSearcher is set to true in the
> autocommit section in solrconfig.xml or your autoSoftCommit is set to
> something other than -1. Real Time Get requires access to all segments and
> it takes a new searcher being opened to release them. Actually, a very
> quick test would be to submit 
> “http://host:port/solr/collection/update?commit=true”
> and see if the index shrinks as a result. You don’t need to change
> solrconfig.xml for that test.
> >
> > If you are opening a new searcher, this is very concerning. There
> shouldn’t be anything else you have to set to prevent the index from
> growing. Could you check one thing? Compare the directory listing of the
> data/index directory just before you shut down Solr and then just after.
> What I’m  interested in is whether some subset of files disappears when you
> shut down Solr. This assumes you’re running on a *nix system, if Windows
> you may have to start Solr again to see the difference.
> >
> > So if you open a searcher and still see the problem, I can try to
> reproduce it. Can you share your solrconfig file or at least the autocommit
> and cache portions?
> >
> > Best,
> > Erick
> >
> >> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
> >>
> >> From what is see it basically duplicates the index files, but does not
> delete the old ones.
> >> It uses caffeine cache.
> >>
> >> What I observe is that there is an exception when shutting down for the
> collection that is updated - timeout waiting for all directory ref counts
> to be released - gave up waiting on CacheDir.
> >>
>  Am 20.01.2020 um 11:26 schrieb Jörn Franke :
> >>>
> >>> Sorry I missed a line - not tlog is growing but the /data/index
> folder is growing - until restart when it seems to be purged.
> >>>
>  Am 20.01.2020 um 10:47 schrieb Jörn Franke :
> 
>  Hi,
> 
>  I have a test system here with Solr 8.4 (but this is also
> reproducible in older Solr versions), which has an index which is growing
> and growing - until the SolrCloud instance is restarted - then it is
> reduced tot the expected normal size.
>  The collection is configured to do auto commit after 15000 ms. I
> expect the index grows comes due to the usage of atomic updates, but I
> would expect that due to the auto commit this does not grow all the time.
>  After the atomic updates a commit is done in any case.
> 
>  I don’t see any error message in the log files, but the growth is
> quiet significant and frequent restarts are not a solution of course.
> 
>  Maybe I am overlooking here a tiny configuration issue?
> 
>  Thank you.
> 
> 
>  Best regards
> >
>


Re: Index growing and growing until restart

2020-01-21 Thread Jörn Franke
thanks for the answer I will look into it - it is a possible explanation. 

> Am 20.01.2020 um 14:30 schrieb Erick Erickson :
> 
> Jörn:
> 
> The only thing I can think of that _might_ cause this (I’m not all that 
> familiar with the code) is if your solrconfig settings never open a searcher. 
> Either you need to be sure openSearcher is set to true in the autocommit 
> section in solrconfig.xml or your autoSoftCommit is set to something other 
> than -1. Real Time Get requires access to all segments and it takes a new 
> searcher being opened to release them. Actually, a very quick test would be 
> to submit “http://host:port/solr/collection/update?commit=true” and see if 
> the index shrinks as a result. You don’t need to change solrconfig.xml for 
> that test.
> 
> If you are opening a new searcher, this is very concerning. There shouldn’t 
> be anything else you have to set to prevent the index from growing. Could you 
> check one thing? Compare the directory listing of the data/index directory 
> just before you shut down Solr and then just after. What I’m  interested in 
> is whether some subset of files disappears when you shut down Solr. This 
> assumes you’re running on a *nix system, if Windows you may have to start 
> Solr again to see the difference.
> 
> So if you open a searcher and still see the problem, I can try to reproduce 
> it. Can you share your solrconfig file or at least the autocommit and cache 
> portions? 
> 
> Best,
> Erick
> 
>> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
>> 
>> From what is see it basically duplicates the index files, but does not 
>> delete the old ones.
>> It uses caffeine cache.
>> 
>> What I observe is that there is an exception when shutting down for the 
>> collection that is updated - timeout waiting for all directory ref counts to 
>> be released - gave up waiting on CacheDir.
>> 
 Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>>> 
>>> Sorry I missed a line - not tlog is growing but the /data/index folder is 
>>> growing - until restart when it seems to be purged.
>>> 
 Am 20.01.2020 um 10:47 schrieb Jörn Franke :
 
 Hi,
 
 I have a test system here with Solr 8.4 (but this is also reproducible in 
 older Solr versions), which has an index which is growing and growing - 
 until the SolrCloud instance is restarted - then it is reduced tot the 
 expected normal size. 
 The collection is configured to do auto commit after 15000 ms. I expect 
 the index grows comes due to the usage of atomic updates, but I would 
 expect that due to the auto commit this does not grow all the time.
 After the atomic updates a commit is done in any case.
 
 I don’t see any error message in the log files, but the growth is quiet 
 significant and frequent restarts are not a solution of course.
 
 Maybe I am overlooking here a tiny configuration issue? 
 
 Thank you.
 
 
 Best regards
> 


Re: Index growing and growing until restart

2020-01-20 Thread Erick Erickson
Jörn:

The only thing I can think of that _might_ cause this (I’m not all that 
familiar with the code) is if your solrconfig settings never open a searcher. 
Either you need to be sure openSearcher is set to true in the autocommit 
section in solrconfig.xml or your autoSoftCommit is set to something other than 
-1. Real Time Get requires access to all segments and it takes a new searcher 
being opened to release them. Actually, a very quick test would be to submit 
“http://host:port/solr/collection/update?commit=true” and see if the index 
shrinks as a result. You don’t need to change solrconfig.xml for that test.

If you are opening a new searcher, this is very concerning. There shouldn’t be 
anything else you have to set to prevent the index from growing. Could you 
check one thing? Compare the directory listing of the data/index directory just 
before you shut down Solr and then just after. What I’m  interested in is 
whether some subset of files disappears when you shut down Solr. This assumes 
you’re running on a *nix system, if Windows you may have to start Solr again to 
see the difference.

So if you open a searcher and still see the problem, I can try to reproduce it. 
Can you share your solrconfig file or at least the autocommit and cache 
portions? 

Best,
Erick

> On Jan 20, 2020, at 5:40 AM, Jörn Franke  wrote:
> 
> From what is see it basically duplicates the index files, but does not delete 
> the old ones.
> It uses caffeine cache.
> 
> What I observe is that there is an exception when shutting down for the 
> collection that is updated - timeout waiting for all directory ref counts to 
> be released - gave up waiting on CacheDir.
> 
>> Am 20.01.2020 um 11:26 schrieb Jörn Franke :
>> 
>> Sorry I missed a line - not tlog is growing but the /data/index folder is 
>> growing - until restart when it seems to be purged.
>> 
>>> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
>>> 
>>> Hi,
>>> 
>>> I have a test system here with Solr 8.4 (but this is also reproducible in 
>>> older Solr versions), which has an index which is growing and growing - 
>>> until the SolrCloud instance is restarted - then it is reduced tot the 
>>> expected normal size. 
>>> The collection is configured to do auto commit after 15000 ms. I expect the 
>>> index grows comes due to the usage of atomic updates, but I would expect 
>>> that due to the auto commit this does not grow all the time.
>>> After the atomic updates a commit is done in any case.
>>> 
>>> I don’t see any error message in the log files, but the growth is quiet 
>>> significant and frequent restarts are not a solution of course.
>>> 
>>> Maybe I am overlooking here a tiny configuration issue? 
>>> 
>>> Thank you.
>>> 
>>> 
>>> Best regards



Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
From what is see it basically duplicates the index files, but does not delete 
the old ones.
It uses caffeine cache.

What I observe is that there is an exception when shutting down for the 
collection that is updated - timeout waiting for all directory ref counts to be 
released - gave up waiting on CacheDir.

> Am 20.01.2020 um 11:26 schrieb Jörn Franke :
> 
> Sorry I missed a line - not tlog is growing but the /data/index folder is 
> growing - until restart when it seems to be purged.
> 
>> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
>> 
>> Hi,
>> 
>> I have a test system here with Solr 8.4 (but this is also reproducible in 
>> older Solr versions), which has an index which is growing and growing - 
>> until the SolrCloud instance is restarted - then it is reduced tot the 
>> expected normal size. 
>> The collection is configured to do auto commit after 15000 ms. I expect the 
>> index grows comes due to the usage of atomic updates, but I would expect 
>> that due to the auto commit this does not grow all the time.
>> After the atomic updates a commit is done in any case.
>> 
>> I don’t see any error message in the log files, but the growth is quiet 
>> significant and frequent restarts are not a solution of course.
>> 
>> Maybe I am overlooking here a tiny configuration issue? 
>> 
>> Thank you.
>> 
>> 
>> Best regards


Re: Index growing and growing until restart

2020-01-20 Thread Jörn Franke
Sorry I missed a line - not tlog is growing but the /data/index folder is 
growing - until restart when it seems to be purged.

> Am 20.01.2020 um 10:47 schrieb Jörn Franke :
> 
> Hi,
> 
> I have a test system here with Solr 8.4 (but this is also reproducible in 
> older Solr versions), which has an index which is growing and growing - until 
> the SolrCloud instance is restarted - then it is reduced tot the expected 
> normal size. 
> The collection is configured to do auto commit after 15000 ms. I expect the 
> index grows comes due to the usage of atomic updates, but I would expect that 
> due to the auto commit this does not grow all the time.
> After the atomic updates a commit is done in any case.
> 
> I don’t see any error message in the log files, but the growth is quiet 
> significant and frequent restarts are not a solution of course.
> 
> Maybe I am overlooking here a tiny configuration issue? 
> 
> Thank you.
> 
> 
> Best regards


Re: Index fetch failed

2019-09-03 Thread Erick Erickson
Shankar:

Two things:

1> please do not hijack threads

2> Follow the instructions here: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc. You must use 
the _exact_ same e-mail as you used to subscribe.

If the initial try doesn't work and following the suggestions at the "problems" 
link doesn't work for you, let us know. But note you need to show us the 
_entire_ return header to allow anyone to diagnose the problem.

Best,
Erick

> On Sep 3, 2019, at 5:55 AM, Shankar Ramalingam  wrote:
> 
> Please remove my email id from this list.
> 
> On Tue, 3 Sep, 2019, 11:06 AM Akreeti Agarwal,  wrote:
> 
>> Hello,
>> 
>> Please help me with the solution for below error.
>> 
>> Memory details of slave server:
>> total   used   free sharedbuffers cached
>> Mem: 15947  15460487 62144   6007
>> -/+ buffers/cache:   9308   6639
>> Swap:0  0  0
>> 
>> 
>> Thanks & Regards,
>> Akreeti Agarwal
>> 
>> -Original Message-
>> From: Akreeti Agarwal 
>> Sent: Wednesday, August 28, 2019 2:45 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Index fetch failed
>> 
>> Yes I am using solr-5.5.5.
>> This error is intermittent. I don't think there must be any issue with
>> master connection limits. This error is accompanied by this on master side:
>> 
>> ERROR (qtp1450821318-60072) [   x:sitecore_web_index]
>> o.a.s.h.ReplicationHandler Unable to get file names for indexCommit
>> generation: 1558637
>> java.nio.file.NoSuchFileException:
>> /solrm-efs/solr-m/server/solr/sitecore_web_index/data/index/_12i9p_1.liv
>>   at
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>>   at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>   at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>   at
>> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>>   at
>> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
>>   at
>> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
>>   at java.nio.file.Files.readAttributes(Files.java:1737)
>>   at java.nio.file.Files.size(Files.java:2332)
>>   at
>> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)
>>   at
>> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:124)
>>   at
>> org.apache.solr.handler.ReplicationHandler.getFileList(ReplicationHandler.java:563)
>>   at
>> org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:253)
>>   at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
>>   at
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>>   at
>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
>>   at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>>   at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>>   at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>>   at
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>>   at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>>   at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>>   at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>>   at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>>   at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>>   at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>>   at
>> org.eclipse.jetty.server.handle

Re: Index fetch failed

2019-09-03 Thread Shankar Ramalingam
Please remove my email id from this list.

On Tue, 3 Sep, 2019, 11:06 AM Akreeti Agarwal,  wrote:

> Hello,
>
> Please help me with the solution for below error.
>
> Memory details of slave server:
> total   used   free sharedbuffers cached
> Mem: 15947  15460487 62144   6007
> -/+ buffers/cache:   9308   6639
> Swap:0  0  0
>
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Akreeti Agarwal 
> Sent: Wednesday, August 28, 2019 2:45 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Index fetch failed
>
> Yes I am using solr-5.5.5.
> This error is intermittent. I don't think there must be any issue with
> master connection limits. This error is accompanied by this on master side:
>
> ERROR (qtp1450821318-60072) [   x:sitecore_web_index]
> o.a.s.h.ReplicationHandler Unable to get file names for indexCommit
> generation: 1558637
> java.nio.file.NoSuchFileException:
> /solrm-efs/solr-m/server/solr/sitecore_web_index/data/index/_12i9p_1.liv
>at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
>at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>at
> sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
>at
> sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
>at
> sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
>at java.nio.file.Files.readAttributes(Files.java:1737)
>at java.nio.file.Files.size(Files.java:2332)
>at
> org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)
>at
> org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:124)
>at
> org.apache.solr.handler.ReplicationHandler.getFileList(ReplicationHandler.java:563)
>at
> org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:253)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
>at
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
>at
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
>at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
>at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
>at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
>at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
>at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>at org.eclipse.jetty.server.Server.handle(Server.java:499)
>at
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
>at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
>at org.eclipse.jetty.io
> .AbstractConnection$2.run(AbstractConnection.java:540)
>at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
>at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.ja

RE: Index fetch failed

2019-09-02 Thread Akreeti Agarwal
Hello,

Please help me with the solution for below error.

Memory details of slave server:
total   used   free sharedbuffers cached
Mem: 15947  15460487 62144   6007
-/+ buffers/cache:   9308   6639
Swap:0  0  0


Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Akreeti Agarwal  
Sent: Wednesday, August 28, 2019 2:45 PM
To: solr-user@lucene.apache.org
Subject: RE: Index fetch failed

Yes I am using solr-5.5.5.
This error is intermittent. I don't think there must be any issue with master 
connection limits. This error is accompanied by this on master side:

ERROR (qtp1450821318-60072) [   x:sitecore_web_index] 
o.a.s.h.ReplicationHandler Unable to get file names for indexCommit generation: 
1558637
java.nio.file.NoSuchFileException: 
/solrm-efs/solr-m/server/solr/sitecore_web_index/data/index/_12i9p_1.liv
   at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
   at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
   at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
   at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
   at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
   at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
   at java.nio.file.Files.readAttributes(Files.java:1737)
   at java.nio.file.Files.size(Files.java:2332)
   at 
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)
   at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:124)
   at 
org.apache.solr.handler.ReplicationHandler.getFileList(ReplicationHandler.java:563)
   at 
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:253)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
   at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
   at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
   at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
   at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
   at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
   at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
   at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
   at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Atita Arora 
Sent: Wednesday, August 28, 2019 2:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Index fetch failed

This looks like ample memory to get the index chunk.
Also, I looked at the IndexFetcher code, I remember you were using Solr
5.5.5 and the only reason in my view, this would happen is when the index chunk 
is not downloaded as can also be seen in the error (Downloaded
0!=123) which clearl

RE: Index fetch failed

2019-08-28 Thread Akreeti Agarwal
Yes I am using solr-5.5.5.
This error is intermittent. I don't think there must be any issue with master 
connection limits. This error is accompanied by this on master side:

ERROR (qtp1450821318-60072) [   x:sitecore_web_index] 
o.a.s.h.ReplicationHandler Unable to get file names for indexCommit generation: 
1558637
java.nio.file.NoSuchFileException: 
/solrm-efs/solr-m/server/solr/sitecore_web_index/data/index/_12i9p_1.liv
   at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
   at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
   at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
   at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
   at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
   at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
   at java.nio.file.Files.readAttributes(Files.java:1737)
   at java.nio.file.Files.size(Files.java:2332)
   at 
org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:210)
   at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:124)
   at 
org.apache.solr.handler.ReplicationHandler.getFileList(ReplicationHandler.java:563)
   at 
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:253)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
   at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
   at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
   at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
   at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
   at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
   at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
   at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
   at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
   at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
   at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
   at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
   at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
   at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
   at org.eclipse.jetty.server.Server.handle(Server.java:499)
   at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
   at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
   at 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
   at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
   at java.lang.Thread.run(Thread.java:748)

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Atita Arora  
Sent: Wednesday, August 28, 2019 2:23 PM
To: solr-user@lucene.apache.org
Subject: Re: Index fetch failed

This looks like ample memory to get the index chunk.
Also, I looked at the IndexFetcher code, I remember you were using Solr
5.5.5 and the only reason in my view, this would happen is when the index chunk 
is not downloaded as can also be seen in the error (Downloaded
0!=123) which clearly states that the index generations are not in sync and 
this is not user aborted action too.

Is this error intermittent? could there be a possibility that your master has 
connection limits? or maybe some network hiccup?



On Wed, Aug 28, 2019 at 10:40 AM Akreeti Agarwal  wrote:

> Hi,
>
> Memory details for slave1:
>
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda1   99G   40G   55G  43% /
> tmpfs   7.8G 0  7.8G   0% /dev/shm
>
> Memory det

Re: Index fetch failed

2019-08-28 Thread Atita Arora
This looks like ample memory to get the index chunk.
Also, I looked at the IndexFetcher code, I remember you were using Solr
5.5.5 and the only reason in my view, this would happen is when the index
chunk is not downloaded as can also be seen in the error (Downloaded
0!=123) which clearly states that the index generations are not in sync and
this is not user aborted action too.

Is this error intermittent? could there be a possibility that your master
has connection limits? or maybe some network hiccup?



On Wed, Aug 28, 2019 at 10:40 AM Akreeti Agarwal  wrote:

> Hi,
>
> Memory details for slave1:
>
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda1   99G   40G   55G  43% /
> tmpfs   7.8G 0  7.8G   0% /dev/shm
>
> Memory details for slave2:
>
> Filesystem  Size  Used Avail Use% Mounted on
> /dev/xvda1   99G   45G   49G  48% /
> tmpfs   7.8G 0  7.8G   0% /dev/shm
>
> Thanks & Regards,
> Akreeti Agarwal
>
> -Original Message-
> From: Atita Arora 
> Sent: Wednesday, August 28, 2019 11:15 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Index fetch failed
>
> Hii,
>
> Do you have enough memory free for the index chunk to be
> fetched/Downloaded on the slave node?
>
>
> On Wed, Aug 28, 2019 at 6:57 AM Akreeti Agarwal  wrote:
>
> > Hello Everyone,
> >
> > I am getting this error continuously on Solr slave, can anyone tell me
> > the solution for this:
> >
> > 642141666 ERROR (indexFetcher-72-thread-1) [   x:sitecore_web_index]
> > o.a.s.h.ReplicationHandler Index fetch failed
> > :org.apache.solr.common.SolrException: Unable to download _12i7v_f.liv
> > completely. Downloaded 0!=123
> >  at
> >
> org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1434)
> >  at
> >
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1314)
> >  at
> >
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:812)
> >  at
> > org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.jav
> > a:427)
> >
> >
> > Thanks & Regards,
> > Akreeti Agarwal
> > (M) +91-8318686601
> >
> > ::DISCLAIMER::
> >
> > --
> > --
> > --
> > 
> > The contents of this e-mail and any attachment(s) are confidential and
> > intended for the named recipient(s) only. E-mail transmission is not
> > guaranteed to be secure or error-free as information could be
> > intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
> > may contain viruses in transmission. The e mail and its contents (with
> > or without referred errors) shall therefore not attach any liability
> > on the originator or HCL or its affiliates. Views or opinions, if any,
> > presented in this email are solely those of the author and may not
> > necessarily reflect the views or opinions of HCL or its affiliates.
> > Any form of reproduction, dissemination, copying, disclosure,
> > modification, distribution and / or publication of this message
> > without the prior written consent of authorized representative of HCL
> > is strictly prohibited. If you have received this email in error
> > please delete it and notify the sender immediately. Before opening any
> > email and/or attachments, please check them for viruses and other
> defects.
> >
> > --
> >
> 
> >
>


RE: Index fetch failed

2019-08-28 Thread Akreeti Agarwal
Hi,

Memory details for slave1:

Filesystem  Size  Used Avail Use% Mounted on
/dev/xvda1   99G   40G   55G  43% /
tmpfs   7.8G 0  7.8G   0% /dev/shm

Memory details for slave2:

Filesystem  Size  Used Avail Use% Mounted on
/dev/xvda1   99G   45G   49G  48% /
tmpfs   7.8G 0  7.8G   0% /dev/shm

Thanks & Regards,
Akreeti Agarwal

-Original Message-
From: Atita Arora  
Sent: Wednesday, August 28, 2019 11:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Index fetch failed

Hii,

Do you have enough memory free for the index chunk to be fetched/Downloaded on 
the slave node?


On Wed, Aug 28, 2019 at 6:57 AM Akreeti Agarwal  wrote:

> Hello Everyone,
>
> I am getting this error continuously on Solr slave, can anyone tell me 
> the solution for this:
>
> 642141666 ERROR (indexFetcher-72-thread-1) [   x:sitecore_web_index]
> o.a.s.h.ReplicationHandler Index fetch failed
> :org.apache.solr.common.SolrException: Unable to download _12i7v_f.liv 
> completely. Downloaded 0!=123
>  at
> org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1434)
>  at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1314)
>  at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:812)
>  at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.jav
> a:427)
>
>
> Thanks & Regards,
> Akreeti Agarwal
> (M) +91-8318686601
>
> ::DISCLAIMER::
>
> --
> --
> --
> 
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only. E-mail transmission is not 
> guaranteed to be secure or error-free as information could be 
> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
> may contain viruses in transmission. The e mail and its contents (with 
> or without referred errors) shall therefore not attach any liability 
> on the originator or HCL or its affiliates. Views or opinions, if any, 
> presented in this email are solely those of the author and may not 
> necessarily reflect the views or opinions of HCL or its affiliates. 
> Any form of reproduction, dissemination, copying, disclosure, 
> modification, distribution and / or publication of this message 
> without the prior written consent of authorized representative of HCL 
> is strictly prohibited. If you have received this email in error 
> please delete it and notify the sender immediately. Before opening any 
> email and/or attachments, please check them for viruses and other defects.
>
> --
> 
>


Re: Index fetch failed

2019-08-27 Thread Atita Arora
Hii,

Do you have enough memory free for the index chunk to be fetched/Downloaded
on the slave node?


On Wed, Aug 28, 2019 at 6:57 AM Akreeti Agarwal  wrote:

> Hello Everyone,
>
> I am getting this error continuously on Solr slave, can anyone tell me the
> solution for this:
>
> 642141666 ERROR (indexFetcher-72-thread-1) [   x:sitecore_web_index]
> o.a.s.h.ReplicationHandler Index fetch failed
> :org.apache.solr.common.SolrException: Unable to download _12i7v_f.liv
> completely. Downloaded 0!=123
>  at
> org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1434)
>  at
> org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1314)
>  at
> org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:812)
>  at
> org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:427)
>
>
> Thanks & Regards,
> Akreeti Agarwal
> (M) +91-8318686601
>
> ::DISCLAIMER::
>
> --
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only. E-mail transmission is not
> guaranteed to be secure or error-free as information could be intercepted,
> corrupted, lost, destroyed, arrive late or incomplete, or may contain
> viruses in transmission. The e mail and its contents (with or without
> referred errors) shall therefore not attach any liability on the originator
> or HCL or its affiliates. Views or opinions, if any, presented in this
> email are solely those of the author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification, distribution and / or
> publication of this message without the prior written consent of authorized
> representative of HCL is strictly prohibited. If you have received this
> email in error please delete it and notify the sender immediately. Before
> opening any email and/or attachments, please check them for viruses and
> other defects.
>
> --
>


Re: Upgrading Solr 6.3.0 to 7.5.0 without having to re-index

2019-04-17 Thread Shawn Heisey

On 4/17/2019 3:52 AM, Ritesh Kumar wrote:

Field type in old configuration - string (solr.StrField)   indexed and
stored set to true.
Field type in new configuration - solr.SortableTextField (docValues enabled)


On your schema, you have changed the field class -- from StrField to 
SortableTextField.  Which, by the way, isn't going to work without a 
reindex even if there are no docValues problems.  You also changed the 
docValues flag, which can sometimes change the docValues type at the 
Lucene level.


If a field has its Lucene docValues type changed, indexing is going to 
fail.  The index will have to be completely deleted, restarted, and 
rebuilt from scratch.  If the index directory is not deleted completely, 
then the error you saw will continue even through a reindex.


Thanks,
Shawn


Upgrading Solr 6.3.0 to 7.5.0 without having to re-index

2019-04-17 Thread Ritesh Kumar
Hello Team,

I have been trying to upgrade Solr 6.3.0 to 7.5.0 and I do not want to
re-index. I tried it using the Index Upgrader Tool
<https://lucene.apache.org/solr/guide/7_5/indexupgrader-tool.html>. The
tool did its part and the current index is according to the current file
format.

The problem I am facing is with fields which have docValues enabled in the
current configuration but was not in the earlier configuration.
The error I get is
*java.lang.IllegalStateException: unexpected docvalues type NONE for field
'abc' (expected one of [SORTED, SORTED_SET]). Re-index with correct
docvalues type.*

Field type in old configuration - string (solr.StrField)   indexed and
stored set to true.
Field type in new configuration - solr.SortableTextField (docValues enabled)

Is there any way I can upgrade with the current field configuration without
having to re-index?

Best,

Ritesh Kumar


Re: Solr exception: java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct

2019-04-10 Thread Erick Erickson
"Re-index with correct docvalues”. I.e. define weight to have docValues=true in 
your schema. WARNING: you have to totally get rid of your current data, I’d 
recommend starting with a new collection.

> On Apr 10, 2019, at 12:21 AM, Alex Broitman  
> wrote:
> 
> We got the Solr exception when searching in Solr:
>  
> SolrNet.Exceptions.SolrConnectionException:  encoding="UTF-8"?>
> 
> true name="status">500160 name="hl">true name="fl">vid:def(rid,id),name,nls_NAME___en-us,nls_NAME_NLS_KEY,txt_display_name,sysid  name="hl.requireFieldMatch">true0 name="hl.usePhraseHighlighter">truegid:(0 
> 21)-(+type:3 -recipients:5164077)-disabled_types:(16 
> 1024 2048){!acls user="5164077" gid="21" group="34" pcid="6" 
> ecid="174"}20 name="version">2.2+(Dashboard Dashboard*) name="defType">edismaxDashboard name="qf">name nls_NAME___en-ustrue name="boost">product(sum(1,product(norm(acl_i),termfreq(acl_i,5164077))),if(exists(weight),weight,1))  name="hl.fl">sysid1 name="spellcheck.collate">true name="msg">unexpected docvalues type NUMERIC for field 'weight' (expected one 
> of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with 
> correct docvalues type. name="trace">java.lang.IllegalStateException: unexpected docvalues type 
> NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, 
> SORTED_NUMERIC, SORTED_SET]). Re-index with correct docvalues type.
> at 
> org.apache.lucene.index.DocValues.checkField(DocValues.java:212)
> at 
> org.apache.lucene.index.DocValues.getDocsWithField(DocValues.java:324)
> at 
> org.apache.lucene.queries.function.valuesource.FloatFieldSource.getValues(FloatFieldSource.java:56)
> at 
> org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:48)
> at 
> org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:35)
> at 
> org.apache.lucene.queries.function.valuesource.IfFunction.getValues(IfFunction.java:47)
> at 
> org.apache.lucene.queries.function.valuesource.MultiFloatFunction.getValues(MultiFloatFunction.java:76)
> at 
> org.apache.lucene.queries.function.BoostedQuery$CustomScorer.init(BoostedQuery.java:124)
> at 
> org.apache.lucene.queries.function.BoostedQuery$CustomScorer.init(BoostedQuery.java:114)
> at 
> org.apache.lucene.queries.function.BoostedQuery$BoostedWeight.scorer(BoostedQuery.java:98)
> at 
> org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
> at 
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
> at 
> org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160)
> at 
> org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:665)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
> at 
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:217)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1582)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1399)
> at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:566)
> at 
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:545)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
> at 
> org.eclipse.jetty.servlet.ServletHand

Solr exception: java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct doc

2019-04-10 Thread Alex Broitman
We got the Solr exception when searching in Solr:

SolrNet.Exceptions.SolrConnectionException: 

true500160truevid:def(rid,id),name,nls_NAME___en-us,nls_NAME_NLS_KEY,txt_display_name,sysidtrue0truegid:(0 
21)-(+type:3 -recipients:5164077)-disabled_types:(16 1024 
2048){!acls user="5164077" gid="21" group="34" pcid="6" 
ecid="174"}202.2+(Dashboard Dashboard*)edismaxDashboardname nls_NAME___en-ustrueproduct(sum(1,product(norm(acl_i),termfreq(acl_i,5164077))),if(exists(weight),weight,1))sysid1trueunexpected docvalues type NUMERIC for field 'weight' (expected one 
of [BINARY, NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with 
correct docvalues type.java.lang.IllegalStateException: 
unexpected docvalues type NUMERIC for field 'weight' (expected one of [BINARY, 
NUMERIC, SORTED, SORTED_NUMERIC, SORTED_SET]). Re-index with correct docvalues 
type.
at 
org.apache.lucene.index.DocValues.checkField(DocValues.java:212)
at 
org.apache.lucene.index.DocValues.getDocsWithField(DocValues.java:324)
at 
org.apache.lucene.queries.function.valuesource.FloatFieldSource.getValues(FloatFieldSource.java:56)
at 
org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:48)
at 
org.apache.lucene.queries.function.valuesource.SimpleBoolFunction.getValues(SimpleBoolFunction.java:35)
at 
org.apache.lucene.queries.function.valuesource.IfFunction.getValues(IfFunction.java:47)
at 
org.apache.lucene.queries.function.valuesource.MultiFloatFunction.getValues(MultiFloatFunction.java:76)
at 
org.apache.lucene.queries.function.BoostedQuery$CustomScorer.init(BoostedQuery.java:124)
at 
org.apache.lucene.queries.function.BoostedQuery$CustomScorer.init(BoostedQuery.java:114)
at 
org.apache.lucene.queries.function.BoostedQuery$BoostedWeight.scorer(BoostedQuery.java:98)
at 
org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
at 
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
at 
org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:160)
at 
org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:665)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
at 
org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:217)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1582)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1399)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:566)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:545)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handl

RE: Index database with SolrJ using xml file directly throws an error

2019-03-04 Thread sami
Thanks James,

it works! 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread Dyer, James
Instead of dataConfig=data-config.xml, use config=data-config.xml .

From: sami 
Sent: Friday, March 1, 2019 3:05 AM
To: solr-user@lucene.apache.org
Subject: RE: Index database with SolrJ using xml file directly throws an error

Hi James,

Thanks for your reply. I am not absolotuely sure I understood everything
correctly here. I would like to index my database to start with fresh index.
I have already done it with DIH execute function.

<http://lucene.472066.n3.nabble.com/file/t494676/test1.png<http://lucene.472066.n3.nabble.com/file/t494676/test1.png>>

It works absolutely fine. But, I want to use SolrJ API instead of using the
inbuilt execute function. The data-config.xml and solrconfig.xml works fine
with my database.

I am using the same data-config.xml file and solrconfig.xml file to do the
indexing with program mentioned in my query.

String url = "http://localhost:8983/solr/test;;
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig","data-config.xml"); *I tried this too. as you
suggested not to use full path. *
server.query(params);

I checked the xml file for any bogus characters too. BUT the same files work
fine with inbuilt DIH not with the code. What it could be?



--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html<http://lucene.472066.n3.nabble.com/Solr-User-f472068.html>


RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread sami
Hi James,

Thanks for your reply. I am not absolotuely sure I understood everything
correctly here. I would like to index my database to start with fresh index.
I have already done it with DIH execute function. 

 

It works absolutely fine. But, I want to use SolrJ API instead of using the
inbuilt execute function. The data-config.xml and solrconfig.xml works fine
with my database. 

I am using the same data-config.xml file and solrconfig.xml file to do the
indexing with program mentioned in my query. 

String url = "http://localhost:8983/solr/test;;
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig","data-config.xml");  *I tried this too. as you
suggested not to use full path. *
server.query(params); 

I checked the xml file for any bogus characters too. BUT the same files work
fine with inbuilt DIH not with the code. What it could be? 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Index database with SolrJ using xml file directly throws an error

2019-02-28 Thread Dyer, James
The parameter "dataConfig" should hold an actual xml document to override the 
data-config.xml file you store in zookeeper (cloud) or the configuration 
directory (standalone).  Typically you do not use this parameter.  Instead, 
specify the "config" parameter with the filename (eg. data-config.xml).  This 
file is the DIH configuration, not solrconfig.xml as you are using.  It is just 
the filename, or path starting at the base configuration directory, not a full 
path as you are using.  Unless you want users to override the DIH configuration 
at request time, it is best to specify the filename using the "config" 
parameter in the request handler's invariant section in solrconfig.xml.

From: sami 
Sent: Thursday, February 28, 2019 8:36 AM
To: solr-user@lucene.apache.org
Subject: Index database with SolrJ using xml file directly throws an error

I would like to index my database using SolrJ Java API. I have already tried
to use DIH directly from the Solr server. It works and indexes well. But
when I would like to use the same XML config file with SolrJ it throws an
error.

**Solr version 7.6.0 SolrJ 7.6.0**

Here is the full code I am using:

String url = "http://localhost:8983/solr/test;;
String dataConfig =
"D:/solr-7.6.0/server/solr/test/conf/solrconfig.xml";
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig",dataConfig);
server.query(params);

But using this piece of code throws an error.

Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/test: Data Config problem: Content
is not allowed in Prolog.

Am I doing it right? Reference:
https://stackoverflow.com/questions/31446644/how-to-do-solr-dataimport-i-e-from-rdbms-using-java-api/54905578#54905578

Is there any other way to index directly.



--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Index database with SolrJ using xml file directly throws an error

2019-02-28 Thread Erick Erickson
That error usually means there are characters (even spaces) at the
_beginning_ of the xml file. DIH may be more forgiving on that front.

Basically, anything preceding the opening tag may cause this error.

Best,
Erick

On Thu, Feb 28, 2019 at 8:24 AM sami  wrote:
>
> I would like to index my database using SolrJ Java API. I have already tried
> to use DIH directly from the Solr server. It works and indexes well. But
> when I would like to use the same XML config file with SolrJ it throws an
> error.
>
> **Solr version 7.6.0 SolrJ 7.6.0**
>
> Here is the full code I am using:
>
> String url = "http://localhost:8983/solr/test;;
> String dataConfig =
> "D:/solr-7.6.0/server/solr/test/conf/solrconfig.xml";
> HttpSolrClient server = new 
> HttpSolrClient.Builder(url).build();
> ModifiableSolrParams params = new ModifiableSolrParams();
> params.set("qt", "/dataimport");
> params.set("command", "full-import");
> params.set("clean", "true");
> params.set("commit", "true");
> params.set("optimize", "true");
> params.set("dataConfig",dataConfig);
> server.query(params);
>
> But using this piece of code throws an error.
>
> Exception in thread "main"
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://localhost:8983/solr/test: Data Config problem: Content
> is not allowed in Prolog.
>
> Am I doing it right? Reference:
> https://stackoverflow.com/questions/31446644/how-to-do-solr-dataimport-i-e-from-rdbms-using-java-api/54905578#54905578
>
> Is there any other way to index directly.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: index size, stored vs indexed

2018-11-14 Thread Erick Erickson
Can't really be answered. For instance, stored data is held in *.fdt
files and is largely irrelevant to searching since that data is only
consulted for returning stored fields of the top N docs. So if your
index consists of 90% stored data it's one answer, if 10% it's totally
another. the stored data can be swapped in and out of the OS memory
space ('cause it's MMapped) with vastly less impact on your system
than other parts of the index. Certainly if you could fit it all in
memory it'd be as fast as possible, whether enough faster to justify
any extra cost is the question.

Plus you'll want to understand how much data on your particular system
is "too much" and take proactive actions when you approach that limit.

So yeah, you'll have to test. Here's a long blog on the subject:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/.
Skip to the section "Prototyping: how to get a handle on this problem"

Best,
Erick


Best,
Erick
On Wed, Nov 14, 2018 at 7:28 AM David Hastings
 wrote:
>
> Was wondering if anyone has an idea of the ratio size of indexed only vs
> stored and indexed in solr 7.x.  I was gong to run some testing myself
> later today but was curious what others have seen in this regard.
> Thanks,
> David


Re: Index optimization takes too long

2018-11-04 Thread Toke Eskildsen
On Sat, 2018-11-03 at 21:41 -0700, Wei wrote:
> Thanks everyone! I checked the system metrics during the optimization
> process. CPU usage is quite low, there is no I/O wait,  and memory
> usage is not much different from before the docValues change.  So I
> wonder what could be the bottleneck.

Are you looking at overall CPU usage or single-core? When we run force
merge, we have a single core at 100% while the rest are idle.


NB: There is currently a thread "Static index, fastest way to do
forceMerge" in the Lucene users mailinglist, which seem to be quite
parallel to this thread.

- Toke Eskildsen, royal Danish Library




Re: Index optimization takes too long

2018-11-03 Thread Wei
Thanks everyone! I checked the system metrics during the optimization
process. CPU usage is quite low, there is no I/O wait,  and memory usage is
not much different from before the docValues change.  So I wonder what
could be the bottleneck.

Thanks,
Wei

On Sat, Nov 3, 2018 at 1:38 PM Erick Erickson 
wrote:

> Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
> (or something like that). Also, there's an idea in SOLR-12259 IIRC, but
> that'll be in 7.6 at the earliest.
>
> On Sat, Nov 3, 2018, 07:13 Shawn Heisey 
> > On 11/3/2018 5:32 AM, Dave wrote:
> > > On a side note, does adding docvalues to an already indexed field, and
> > then optimizing, prevent the need to reindex to take advantage of
> > docvalues? I was under the impression you had to reindex the content.
> >
> > You must reindex when changing the schema to add docValues.  An optimize
> > will not build the new data structures. It will only rebuild the data
> > structures that are already there.
> >
> > Thanks,
> > Shawn
> >
> >
>


Re: Index optimization takes too long

2018-11-03 Thread Erick Erickson
Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
(or something like that). Also, there's an idea in SOLR-12259 IIRC, but
that'll be in 7.6 at the earliest.

On Sat, Nov 3, 2018, 07:13 Shawn Heisey  On 11/3/2018 5:32 AM, Dave wrote:
> > On a side note, does adding docvalues to an already indexed field, and
> then optimizing, prevent the need to reindex to take advantage of
> docvalues? I was under the impression you had to reindex the content.
>
> You must reindex when changing the schema to add docValues.  An optimize
> will not build the new data structures. It will only rebuild the data
> structures that are already there.
>
> Thanks,
> Shawn
>
>


Re: Index optimization takes too long

2018-11-03 Thread Shawn Heisey

On 11/3/2018 5:32 AM, Dave wrote:

On a side note, does adding docvalues to an already indexed field, and then 
optimizing, prevent the need to reindex to take advantage of docvalues? I was 
under the impression you had to reindex the content.


You must reindex when changing the schema to add docValues.  An optimize 
will not build the new data structures. It will only rebuild the data 
structures that are already there.


Thanks,
Shawn



Re: Index optimization takes too long

2018-11-03 Thread Dave
On a side note, does adding docvalues to an already indexed field, and then 
optimizing, prevent the need to reindex to take advantage of docvalues? I was 
under the impression you had to reindex the content. 

> On Nov 3, 2018, at 4:41 AM, Deepak Goel  wrote:
> 
> I would start by monitoring the hardware (CPU, Memory, Disk) & software
> (heap, threads) utilization's and seeing where the bottlenecks are. Or what
> is getting utilized the most. And then tune that parameter.
> 
> I would also look at profiling the software.
> 
> 
> Deepak
> "The greatness of a nation can be judged by the way its animals are
> treated. Please consider stopping the cruelty by becoming a Vegan"
> 
> +91 73500 12833
> deic...@gmail.com
> 
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
> 
> "Plant a Tree, Go Green"
> 
> Make In India : http://www.makeinindia.com/home
> 
> 
>> On Sat, Nov 3, 2018 at 4:30 AM Wei  wrote:
>> 
>> Hello,
>> 
>> After a recent schema change,  it takes almost 40 minutes to optimize the
>> index.  The schema change is to enable docValues for all sort/facet fields,
>> which increase the index size from 12G to 14G. Before the change it only
>> takes 5 minutes to do the optimization.
>> 
>> I have tried to increase maxMergeAtOnceExplicit because the default 30
>> could be too low:
>> 
>> 100
>> 
>> But it doesn't seem to help. Any suggestions?
>> 
>> Thanks,
>> Wei
>> 


Re: Index optimization takes too long

2018-11-03 Thread Deepak Goel
I would start by monitoring the hardware (CPU, Memory, Disk) & software
(heap, threads) utilization's and seeing where the bottlenecks are. Or what
is getting utilized the most. And then tune that parameter.

I would also look at profiling the software.


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
deic...@gmail.com

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sat, Nov 3, 2018 at 4:30 AM Wei  wrote:

> Hello,
>
> After a recent schema change,  it takes almost 40 minutes to optimize the
> index.  The schema change is to enable docValues for all sort/facet fields,
> which increase the index size from 12G to 14G. Before the change it only
> takes 5 minutes to do the optimization.
>
> I have tried to increase maxMergeAtOnceExplicit because the default 30
> could be too low:
>
> 100
>
> But it doesn't seem to help. Any suggestions?
>
> Thanks,
> Wei
>


Re: Index optimization takes too long

2018-11-02 Thread Shawn Heisey

On 11/2/2018 5:00 PM, Wei wrote:

After a recent schema change,  it takes almost 40 minutes to optimize the
index.  The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.


An optimize is not just a straight data copy.  Lucene is actually 
completely recalculating the index data structures.  It will never 
proceed at the full data rate your disks are capable of achieving.


I do not know how docValues actually work during a segment merge, but 
given exactly how the info relates to the inverted index, it's probably 
even more complicated than the rest of the data structures in a Lucene 
index.


On one of the systems I used to manage, back in March of 2017, I was 
seeing a 50GB index take 1.73 hours to optimize.  I do not recall 
whether I had docValues at that point, but I probably did.


http://lucene.472066.n3.nabble.com/What-is-the-bottleneck-for-an-optimise-operation-tt4323039.html#a4323140

There's not much you can do to make this go faster. Putting massively 
faster CPUs in the machine MIGHT make a difference, but it probably 
wouldn't be a BIG difference.  I'm talking about clock speed, not core 
count.


Thanks,
Shawn



Re: Index fetch failed. Exception: Server refused connection

2018-10-25 Thread Walter Underwood
A 1 Gb heap is probably too small on the master. Run with 8 Gb like the slaves.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 24, 2018, at 10:20 PM, Bharat Yadav  wrote:
> 
> Hello Team,
>  
> We are now a days frequently facing below issue on our SOLR Slave Nodes 
> related to public Class SnapPuller.
>  
> Issue 
>  
> Master at: http://prd01slr:3/solr/MainSystem2 
>  is not available. Index fetch 
> failed. Exception: Server refused connection at: 
> http://prd01slr:3/solr/MainSystem2 
> 
>  
> slroms1@prd02slr:slroms1/JEE/SolrProduct/logs/SolrDomain_SolrSlaveServer2/SolrSlaveServer2>
>  grep "Index fetch failed" weblogic.20181018_004904.log
> 425511535 [snapPuller-14-thread-1] ERROR org.apache.solr.handler.SnapPuller  
> – Master at: http://prd01slr:3/solr/MainSystem2 
>  is not available. Index fetch 
> failed. Exception: Server refused connection at: 
> http://prd01slr:3/solr/MainSystem2 
> 
> 425511535 [snapPuller-13-thread-1] ERROR org.apache.solr.handler.SnapPuller  
> – Master at: http://prd01slr:3/solr/MainSystem1 
>  is not available. Index fetch 
> failed. Exception: Server refused connection 
> at:http://prd01slr:3/solr/MainSystem1 
> 
> 598311531 [snapPuller-14-thread-1] ERROR org.apache.solr.handler.SnapPuller  
> – Master at: http://prd01slr:3/solr/MainSystem2 
>  is not available. Index fetch 
> failed. Exception: Server refused connection 
> at:http://prd01slr:3/solr/MainSystem2 
> 
>  
> Note –
> MainSystem1 and MainSystem2 are the Cores in our account.
> When we face this issue sometime we have to bounce our SOLR JVM’s 
> and sometimes automatically it recovers and we don’t need to do any bounce.
>  
> SetUp of SOLR In Account
>  
> ·SOLR Version
>  
> 
>  
> ·We are using SOLR with 1 master and 2 Slave configuration.
>  
> a) Master is running with -  “-Xms1G -Xmx1G -XX:MaxPermSize=256m”
> b)Slaves are running with – “-Xms8G -Xmx8G -XX:MaxPermSize=1024m”
>  
> ·SOLR ear is deployed on all 3 individual Weblogic instances and it 
> is same across.
> ·Indexing is getting done on Master and then we have replication + 
> polling enabled on Slave JVM’s to have it in sync with Master at any time And 
> all querying are being handled by SOLR Slaves only.
> ·For Polling we have defined the timing of 60 sec as highlighted 
> below on Slave solr xml. (I am attaching solr xml configured to slave and 
> master for your reference)
>  
> 
> 
> 
>
>  ${enable.master:false}
>  commit
>  startup
>  schema.xml,stopwords.txt
>
> 
> 
>
>  true
>  http://xx:x/solr/MainSystem2 
> 
>  00:00:60
>
> 
>   
>  
> 
> ·We have GC enabled on jvm’s too but we didn’t find anything 
> suspicious there. If you need gc logs let us know.
>  
> Connectivity Check
>  
> ·Slave 1 – 
>  
>  
> ·Slave 2 –
>   
>  
> Statistics about the Core
>  
> 
>  
>  
> Thanks and Regards
> Bharat Yadav
> Infra INT
> Amdocs Intelligent Operations, SI
> India Mob  +91-987464 (WhatsApp only)
> Chile Mob  +56-998022829
>  
> 
> 
>  
>  
> “Amdocs’ email platform is based on a third-party, worldwide, cloud-based 
> system. Any emails sent to Amdocs will be processed and stored using such 
> system and are accessible by third party providers of such system on a 
> limited basis. Your sending of emails to Amdocs evidences your consent to the 
> use of such system and such processing, storing and access”.
> 



Re: Index size issue in SOLR-6.5.1

2018-10-08 Thread Dominique Bejean
HI,

In the Solr Admin console, you can access for each core to the "Segment
info" page. You can see if there are more deleted documents in segments on
server X.

Dominique

Le lun. 8 oct. 2018 à 07:29, SOLR4189  a écrit :

> About which details do you ask? Yesterday we restarted all our solr
> services
> and index size in serverX descreased from 82Gb to 60Gb, and in serverY
> index
> size didn't change (49Gb).
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread SOLR4189
About which details do you ask? Yesterday we restarted all our solr services
and index size in serverX descreased from 82Gb to 60Gb, and in serverY index
size didn't change (49Gb).



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Index size issue in SOLR-6.5.1

2018-10-07 Thread Dominique Bejean
Hi,

What about cores segment details in admin UI  interface ? More deleted
documents ?

Regards

Dominique

Le dim. 7 oct. 2018 à 08:22, SOLR4189  a écrit :

> Hi all,
>
> We use SOLR-6.5.1 and we have very strange issue. In our collection index
> size is very different from server to server (33gb difference):
> 1. We have index size 82Gb in serverX and 49Gb in serverY
> 2. ServerX displays 82gb used place if we run "df -h
> /opt/solr/Xxx_shardX_replica1/data/index"
> and through web admin ui it displays 60gb used place.
>
> What can it be? Why do we have difference between server? Between server
> and
> web admin ui?
>
> Thank you.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Index Upgrader tool

2018-08-24 Thread Shawn Heisey

On 8/24/2018 12:44 AM, dami...@gmail.com wrote:

Shawn, Is it possible to run optimize on the live collection? For example,
/solr/collection/update?commit=true=true


For all the reasons in the blog post that Erick referenced, we recommend 
that you do not do this.


Something to note:  That optimize operation is *EXACTLY* what 
IndexUpgrader does.  So when it comes right down to it, I wouldn't 
recommend using IndexUpgrader either.


Solr 6 can read Solr 5 indexes directly, no "upgrade" required.  But as 
you've found, if you take a version 5 index and upgrade it to 6 (which 
is really just an optimize/forceMerge), then try to upgrade it again to 
version 7, it may not work.


The gist of all this is that I do not recommend using indexes from a 
previous version AT ALL.  You should build the indexes from scratch 
using the new version.


Thanks,
Shawn



Re: Index Upgrader tool

2018-08-24 Thread Erick Erickson
Yes, it's possible to run optimize on a live index. I wouldn't though, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

Lucene has _never_ guaranteed proper functioning of an index created
with version X-2 with
version X. It hasn't been super-obvious, but we had a long discussion
about it on here:

https://issues.apache.org/jira/browse/LUCENE-8264

but here's the most succinct statement of why upgrading more than one
major version isn't
really possible from Robert Muir:

"I think the key issue here is Lucene is an index not a database.
Because it is a lossy index and does not retain all of the user's
data, its not possible to safely migrate some things automagically."

So Lucene labors heroically to maintain 1 version back-compat, but
that's all that's guaranteed.

So I'd _really_ recommend you reindex if at all possible.

Best,
Erick

Best,
Erick
On Thu, Aug 23, 2018 at 11:45 PM  wrote:
>
> Shawn, Is it possible to run optimize on the live collection? For example,
> /solr/collection/update?commit=true=true
>
> On Wed, 22 Aug 2018 at 06:50, Shawn Heisey  wrote:
>
> > On 8/21/2018 2:29 AM, Artjoms Laivins wrote:
> > > We are running Solr cloud with 3 nodes v. 6.6.2
> > > We started with version 5 so we have some old index that we need safely
> > move over to v. 7 now.
> > > New data comes in several times per day.
> > > Our questions are:
> > >
> > > Should we run IndexUpgrader tool on one slave node that is down or it is
> > safe to run it while Solr is running and possible updates of the index are
> > coming?
> > > If yes, when we start it again will leader update this node with new
> > data only or will it overwrite index?
> >
> > It might not be possible to upgrade two major versions like that, even
> > with IndexUpgrader.  There is only a guarantee of reading an index
> > ORIGINALLY written by the previous major version.
> >
> > Even if it's possible to accomplish an upgrade, it is strongly
> > recommended that you index from scratch anyway.
> >
> > You cannot run IndexUpgrader while Solr has the index open.  The index
> > must be completely closed.  You cannot update an index while it is being
> > upgraded.
> >
> > Thanks,
> > Shawn
> >
> >


Re: Index Upgrader tool

2018-08-24 Thread damienk
Shawn, Is it possible to run optimize on the live collection? For example,
/solr/collection/update?commit=true=true

On Wed, 22 Aug 2018 at 06:50, Shawn Heisey  wrote:

> On 8/21/2018 2:29 AM, Artjoms Laivins wrote:
> > We are running Solr cloud with 3 nodes v. 6.6.2
> > We started with version 5 so we have some old index that we need safely
> move over to v. 7 now.
> > New data comes in several times per day.
> > Our questions are:
> >
> > Should we run IndexUpgrader tool on one slave node that is down or it is
> safe to run it while Solr is running and possible updates of the index are
> coming?
> > If yes, when we start it again will leader update this node with new
> data only or will it overwrite index?
>
> It might not be possible to upgrade two major versions like that, even
> with IndexUpgrader.  There is only a guarantee of reading an index
> ORIGINALLY written by the previous major version.
>
> Even if it's possible to accomplish an upgrade, it is strongly
> recommended that you index from scratch anyway.
>
> You cannot run IndexUpgrader while Solr has the index open.  The index
> must be completely closed.  You cannot update an index while it is being
> upgraded.
>
> Thanks,
> Shawn
>
>


Re: Index Upgrader tool

2018-08-21 Thread Shawn Heisey

On 8/21/2018 2:29 AM, Artjoms Laivins wrote:

We are running Solr cloud with 3 nodes v. 6.6.2
We started with version 5 so we have some old index that we need safely move 
over to v. 7 now.
New data comes in several times per day.
Our questions are:

Should we run IndexUpgrader tool on one slave node that is down or it is safe 
to run it while Solr is running and possible updates of the index are coming?
If yes, when we start it again will leader update this node with new data only 
or will it overwrite index?


It might not be possible to upgrade two major versions like that, even 
with IndexUpgrader.  There is only a guarantee of reading an index 
ORIGINALLY written by the previous major version.


Even if it's possible to accomplish an upgrade, it is strongly 
recommended that you index from scratch anyway.


You cannot run IndexUpgrader while Solr has the index open.  The index 
must be completely closed.  You cannot update an index while it is being 
upgraded.


Thanks,
Shawn



Re: Index protected zip

2018-05-29 Thread Cassandra Targett
Someone needs to update the Ref Guide. That can be a patch submitted on a
JIRA issue, or a committer could forego a patch and make changes directly
with commits.

Otherwise, this wiki page is making a bad situation even worse.

On Tue, May 29, 2018 at 12:06 PM Tim Allison  wrote:

> I’m happy to contribute to this message in any way I can.  Let me know how
> I can help.
>
> On Tue, May 29, 2018 at 2:31 PM Cassandra Targett 
> wrote:
>
> > It's not as simple as a banner. Information was added to the wiki that
> does
> > not exist in the Ref Guide.
> >
> > Before you say "go look at the Ref Guide" you need to make sure it says
> > what you want it to say, and the creation of this page just 3 days ago
> > indicates to me that the Ref Guide is missing something.
> >
> > On Tue, May 29, 2018 at 1:04 PM Erick Erickson 
> > wrote:
> >
> > > On further reflection ,+1 to marking the Wiki page superseded by the
> > > reference guide. I'd be fine with putting a banner at the top of all
> > > the Wiki pages saying "check the Solr reference guide first" ;)
> > >
> > > On Tue, May 29, 2018 at 10:59 AM, Cassandra Targett
> > >  wrote:
> > > > Couldn't the same information on that page be put into the Solr Ref
> > > Guide?
> > > >
> > > > I mean, if that's what we recommend, it should be documented
> officially
> > > > that it's what we recommend.
> > > >
> > > > I mean, is anyone surprised people keep stumbling over this? Shawn's
> > wiki
> > > > page doesn't point to the Ref Guide (instead pointing at other wiki
> > pages
> > > > that are out of date) and the Ref Guide doesn't point to that page.
> So
> > > half
> > > > the info is in our "official" place but the real story is in another
> > > place,
> > > > one we alternately tell people to sometimes ignore but sometimes keep
> > up
> > > to
> > > > date? Even I'm confused.
> > > >
> > > > On Sat, May 26, 2018 at 6:41 PM Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > >> Thanks! now I can just record the URL and then paste it in ;)
> > > >>
> > > >> Who knows, maybe people will see it first too!
> > > >>
> > > >> On Sat, May 26, 2018 at 9:48 AM, Tim Allison 
> > > wrote:
> > > >> > W00t! Thank you, Shawn!
> > > >> >
> > > >> > The "don't use ERH in production" response comes up frequently
> > enough
> > > >> >> that I have created a wiki page we can use for responses:
> > > >> >>
> > > >> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
> > > >> >>
> > > >> >> Tim, you are extremely well-qualified to expand and correct this
> > > page.
> > > >> >> Erick may be interested in making adjustments also. The flow of
> the
> > > page
> > > >> >> feels a little bit awkward to me, but I'm not sure how to improve
> > it.
> > > >> >>
> > > >> >> If the page name is substandard, feel free to rename.  I've
> already
> > > >> >> renamed it once!  I searched for an existing page like this
> before
> > I
> > > >> >> started creating it.  I did put a link to the new page on the
> > > >> >> ExtractingRequestHandler page.
> > > >> >>
> > > >> >> Thanks,
> > > >> >> Shawn
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
>


Re: Index protected zip

2018-05-29 Thread Tim Allison
I’m happy to contribute to this message in any way I can.  Let me know how
I can help.

On Tue, May 29, 2018 at 2:31 PM Cassandra Targett 
wrote:

> It's not as simple as a banner. Information was added to the wiki that does
> not exist in the Ref Guide.
>
> Before you say "go look at the Ref Guide" you need to make sure it says
> what you want it to say, and the creation of this page just 3 days ago
> indicates to me that the Ref Guide is missing something.
>
> On Tue, May 29, 2018 at 1:04 PM Erick Erickson 
> wrote:
>
> > On further reflection ,+1 to marking the Wiki page superseded by the
> > reference guide. I'd be fine with putting a banner at the top of all
> > the Wiki pages saying "check the Solr reference guide first" ;)
> >
> > On Tue, May 29, 2018 at 10:59 AM, Cassandra Targett
> >  wrote:
> > > Couldn't the same information on that page be put into the Solr Ref
> > Guide?
> > >
> > > I mean, if that's what we recommend, it should be documented officially
> > > that it's what we recommend.
> > >
> > > I mean, is anyone surprised people keep stumbling over this? Shawn's
> wiki
> > > page doesn't point to the Ref Guide (instead pointing at other wiki
> pages
> > > that are out of date) and the Ref Guide doesn't point to that page. So
> > half
> > > the info is in our "official" place but the real story is in another
> > place,
> > > one we alternately tell people to sometimes ignore but sometimes keep
> up
> > to
> > > date? Even I'm confused.
> > >
> > > On Sat, May 26, 2018 at 6:41 PM Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > >> Thanks! now I can just record the URL and then paste it in ;)
> > >>
> > >> Who knows, maybe people will see it first too!
> > >>
> > >> On Sat, May 26, 2018 at 9:48 AM, Tim Allison 
> > wrote:
> > >> > W00t! Thank you, Shawn!
> > >> >
> > >> > The "don't use ERH in production" response comes up frequently
> enough
> > >> >> that I have created a wiki page we can use for responses:
> > >> >>
> > >> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
> > >> >>
> > >> >> Tim, you are extremely well-qualified to expand and correct this
> > page.
> > >> >> Erick may be interested in making adjustments also. The flow of the
> > page
> > >> >> feels a little bit awkward to me, but I'm not sure how to improve
> it.
> > >> >>
> > >> >> If the page name is substandard, feel free to rename.  I've already
> > >> >> renamed it once!  I searched for an existing page like this before
> I
> > >> >> started creating it.  I did put a link to the new page on the
> > >> >> ExtractingRequestHandler page.
> > >> >>
> > >> >> Thanks,
> > >> >> Shawn
> > >> >>
> > >> >>
> > >>
> >
>


Re: Index protected zip

2018-05-29 Thread Cassandra Targett
It's not as simple as a banner. Information was added to the wiki that does
not exist in the Ref Guide.

Before you say "go look at the Ref Guide" you need to make sure it says
what you want it to say, and the creation of this page just 3 days ago
indicates to me that the Ref Guide is missing something.

On Tue, May 29, 2018 at 1:04 PM Erick Erickson 
wrote:

> On further reflection ,+1 to marking the Wiki page superseded by the
> reference guide. I'd be fine with putting a banner at the top of all
> the Wiki pages saying "check the Solr reference guide first" ;)
>
> On Tue, May 29, 2018 at 10:59 AM, Cassandra Targett
>  wrote:
> > Couldn't the same information on that page be put into the Solr Ref
> Guide?
> >
> > I mean, if that's what we recommend, it should be documented officially
> > that it's what we recommend.
> >
> > I mean, is anyone surprised people keep stumbling over this? Shawn's wiki
> > page doesn't point to the Ref Guide (instead pointing at other wiki pages
> > that are out of date) and the Ref Guide doesn't point to that page. So
> half
> > the info is in our "official" place but the real story is in another
> place,
> > one we alternately tell people to sometimes ignore but sometimes keep up
> to
> > date? Even I'm confused.
> >
> > On Sat, May 26, 2018 at 6:41 PM Erick Erickson 
> > wrote:
> >
> >> Thanks! now I can just record the URL and then paste it in ;)
> >>
> >> Who knows, maybe people will see it first too!
> >>
> >> On Sat, May 26, 2018 at 9:48 AM, Tim Allison 
> wrote:
> >> > W00t! Thank you, Shawn!
> >> >
> >> > The "don't use ERH in production" response comes up frequently enough
> >> >> that I have created a wiki page we can use for responses:
> >> >>
> >> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
> >> >>
> >> >> Tim, you are extremely well-qualified to expand and correct this
> page.
> >> >> Erick may be interested in making adjustments also. The flow of the
> page
> >> >> feels a little bit awkward to me, but I'm not sure how to improve it.
> >> >>
> >> >> If the page name is substandard, feel free to rename.  I've already
> >> >> renamed it once!  I searched for an existing page like this before I
> >> >> started creating it.  I did put a link to the new page on the
> >> >> ExtractingRequestHandler page.
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >>
> >>
>


Re: Index protected zip

2018-05-29 Thread Erick Erickson
On further reflection ,+1 to marking the Wiki page superseded by the
reference guide. I'd be fine with putting a banner at the top of all
the Wiki pages saying "check the Solr reference guide first" ;)

On Tue, May 29, 2018 at 10:59 AM, Cassandra Targett
 wrote:
> Couldn't the same information on that page be put into the Solr Ref Guide?
>
> I mean, if that's what we recommend, it should be documented officially
> that it's what we recommend.
>
> I mean, is anyone surprised people keep stumbling over this? Shawn's wiki
> page doesn't point to the Ref Guide (instead pointing at other wiki pages
> that are out of date) and the Ref Guide doesn't point to that page. So half
> the info is in our "official" place but the real story is in another place,
> one we alternately tell people to sometimes ignore but sometimes keep up to
> date? Even I'm confused.
>
> On Sat, May 26, 2018 at 6:41 PM Erick Erickson 
> wrote:
>
>> Thanks! now I can just record the URL and then paste it in ;)
>>
>> Who knows, maybe people will see it first too!
>>
>> On Sat, May 26, 2018 at 9:48 AM, Tim Allison  wrote:
>> > W00t! Thank you, Shawn!
>> >
>> > The "don't use ERH in production" response comes up frequently enough
>> >> that I have created a wiki page we can use for responses:
>> >>
>> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
>> >>
>> >> Tim, you are extremely well-qualified to expand and correct this page.
>> >> Erick may be interested in making adjustments also. The flow of the page
>> >> feels a little bit awkward to me, but I'm not sure how to improve it.
>> >>
>> >> If the page name is substandard, feel free to rename.  I've already
>> >> renamed it once!  I searched for an existing page like this before I
>> >> started creating it.  I did put a link to the new page on the
>> >> ExtractingRequestHandler page.
>> >>
>> >> Thanks,
>> >> Shawn
>> >>
>> >>
>>


Re: Index protected zip

2018-05-29 Thread Cassandra Targett
Couldn't the same information on that page be put into the Solr Ref Guide?

I mean, if that's what we recommend, it should be documented officially
that it's what we recommend.

I mean, is anyone surprised people keep stumbling over this? Shawn's wiki
page doesn't point to the Ref Guide (instead pointing at other wiki pages
that are out of date) and the Ref Guide doesn't point to that page. So half
the info is in our "official" place but the real story is in another place,
one we alternately tell people to sometimes ignore but sometimes keep up to
date? Even I'm confused.

On Sat, May 26, 2018 at 6:41 PM Erick Erickson 
wrote:

> Thanks! now I can just record the URL and then paste it in ;)
>
> Who knows, maybe people will see it first too!
>
> On Sat, May 26, 2018 at 9:48 AM, Tim Allison  wrote:
> > W00t! Thank you, Shawn!
> >
> > The "don't use ERH in production" response comes up frequently enough
> >> that I have created a wiki page we can use for responses:
> >>
> >> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
> >>
> >> Tim, you are extremely well-qualified to expand and correct this page.
> >> Erick may be interested in making adjustments also. The flow of the page
> >> feels a little bit awkward to me, but I'm not sure how to improve it.
> >>
> >> If the page name is substandard, feel free to rename.  I've already
> >> renamed it once!  I searched for an existing page like this before I
> >> started creating it.  I did put a link to the new page on the
> >> ExtractingRequestHandler page.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: Index protected zip

2018-05-26 Thread Erick Erickson
Thanks! now I can just record the URL and then paste it in ;)

Who knows, maybe people will see it first too!

On Sat, May 26, 2018 at 9:48 AM, Tim Allison  wrote:
> W00t! Thank you, Shawn!
>
> The "don't use ERH in production" response comes up frequently enough
>> that I have created a wiki page we can use for responses:
>>
>> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
>>
>> Tim, you are extremely well-qualified to expand and correct this page.
>> Erick may be interested in making adjustments also. The flow of the page
>> feels a little bit awkward to me, but I'm not sure how to improve it.
>>
>> If the page name is substandard, feel free to rename.  I've already
>> renamed it once!  I searched for an existing page like this before I
>> started creating it.  I did put a link to the new page on the
>> ExtractingRequestHandler page.
>>
>> Thanks,
>> Shawn
>>
>>


Re: Index protected zip

2018-05-26 Thread Tim Allison
W00t! Thank you, Shawn!

The "don't use ERH in production" response comes up frequently enough
> that I have created a wiki page we can use for responses:
>
> https://wiki.apache.org/solr/RecommendCustomIndexingWithTika
>
> Tim, you are extremely well-qualified to expand and correct this page.
> Erick may be interested in making adjustments also. The flow of the page
> feels a little bit awkward to me, but I'm not sure how to improve it.
>
> If the page name is substandard, feel free to rename.  I've already
> renamed it once!  I searched for an existing page like this before I
> started creating it.  I did put a link to the new page on the
> ExtractingRequestHandler page.
>
> Thanks,
> Shawn
>
>


Re: Index protected zip

2018-05-26 Thread Shawn Heisey

On 5/26/2018 4:52 AM, Tim Allison wrote:

Please see Erick Erickson’s evergreen advice and linked blog post:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201805.mbox/%3ccan4yxve_0gn0a1y7wjpr27inuddo6+jzwwfgvzkfs40gh3r...@mail.gmail.com%3e


The "don't use ERH in production" response comes up frequently enough 
that I have created a wiki page we can use for responses:


https://wiki.apache.org/solr/RecommendCustomIndexingWithTika

Tim, you are extremely well-qualified to expand and correct this page.  
Erick may be interested in making adjustments also. The flow of the page 
feels a little bit awkward to me, but I'm not sure how to improve it.


If the page name is substandard, feel free to rename.  I've already 
renamed it once!  I searched for an existing page like this before I 
started creating it.  I did put a link to the new page on the 
ExtractingRequestHandler page.


Thanks,
Shawn



Re: Index protected zip

2018-05-26 Thread Tim Allison
On third thought, I can’t think of how you’d easily inject a
PasswordProvider into Solr’s integration.

Please see Erick Erickson’s evergreen advice and linked blog post:

https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201805.mbox/%3ccan4yxve_0gn0a1y7wjpr27inuddo6+jzwwfgvzkfs40gh3r...@mail.gmail.com%3e


On Sat, May 26, 2018 at 6:34 AM Tim Allison  wrote:

> You’ll need to provide a PasswordProvider in the ParseContext.  I don’t
> think that is currently possible in the Solr integration. Please open a
> ticket if SolrJ doesn’t meet your needs.
>
> On Thu, May 24, 2018 at 1:03 PM Alexandre Rafalovitch 
> wrote:
>
>> Hmm. If it works, then it is Tika magic. Which may mean they may have a
>> setting for passwords. Which would need to be configured and then exposed
>> through Solr.
>>
>> So, I would check if you can extract text with Tika standalone first.
>>
>> Regards,
>> Alex
>>
>> On Thu, May 24, 2018, 5:05 AM Dimitris Kardarakos, <
>> dimitris.kardara...@iteam.gr> wrote:
>>
>> > Hello everyone.
>> >
>> > In Solr 7.3.0 I can successfully index the content of zip files.
>> >
>> > But if the zip file is password protected, running something like the
>> > below:
>> >
>> > curl
>> > "
>> >
>> http://localhost:8983/solr/sample/update/extract?commit=true&=enc.zip=1234
>> "
>> >
>> > -H "Content-Type: application/zip" --data-binary @enc.zip
>> >
>> > only the names of the files contained are indexed.
>> >
>> > Is it a known issue or I am doing sth wrong?
>> >
>> > Thanks!
>> >
>> > --
>> > Dimitris Kardarakos
>> >
>> >
>>
>


Re: Index protected zip

2018-05-26 Thread Tim Allison
You’ll need to provide a PasswordProvider in the ParseContext.  I don’t
think that is currently possible in the Solr integration. Please open a
ticket if SolrJ doesn’t meet your needs.

On Thu, May 24, 2018 at 1:03 PM Alexandre Rafalovitch 
wrote:

> Hmm. If it works, then it is Tika magic. Which may mean they may have a
> setting for passwords. Which would need to be configured and then exposed
> through Solr.
>
> So, I would check if you can extract text with Tika standalone first.
>
> Regards,
> Alex
>
> On Thu, May 24, 2018, 5:05 AM Dimitris Kardarakos, <
> dimitris.kardara...@iteam.gr> wrote:
>
> > Hello everyone.
> >
> > In Solr 7.3.0 I can successfully index the content of zip files.
> >
> > But if the zip file is password protected, running something like the
> > below:
> >
> > curl
> > "
> >
> http://localhost:8983/solr/sample/update/extract?commit=true&=enc.zip=1234
> "
> >
> > -H "Content-Type: application/zip" --data-binary @enc.zip
> >
> > only the names of the files contained are indexed.
> >
> > Is it a known issue or I am doing sth wrong?
> >
> > Thanks!
> >
> > --
> > Dimitris Kardarakos
> >
> >
>


Re: Index protected zip

2018-05-24 Thread Alexandre Rafalovitch
Hmm. If it works, then it is Tika magic. Which may mean they may have a
setting for passwords. Which would need to be configured and then exposed
through Solr.

So, I would check if you can extract text with Tika standalone first.

Regards,
Alex

On Thu, May 24, 2018, 5:05 AM Dimitris Kardarakos, <
dimitris.kardara...@iteam.gr> wrote:

> Hello everyone.
>
> In Solr 7.3.0 I can successfully index the content of zip files.
>
> But if the zip file is password protected, running something like the
> below:
>
> curl
> "
> http://localhost:8983/solr/sample/update/extract?commit=true&=enc.zip=1234;
>
> -H "Content-Type: application/zip" --data-binary @enc.zip
>
> only the names of the files contained are indexed.
>
> Is it a known issue or I am doing sth wrong?
>
> Thanks!
>
> --
> Dimitris Kardarakos
>
>


Re: Index filename while indexing JSON file

2018-05-23 Thread Shawn Heisey
On 5/18/2018 1:47 PM, S.Ashwath wrote:
> I have 2 directories: 1 with txt files and the other with corresponding
> JSON (metadata) files (around 9 of each). There is one JSON file for
> each CSV file, and they share the same name (they don't share any other
> fields).
>
> The txt files just have plain text, I mapped each line to a field call
> 'sentence' and included the file name as a field using the data import
> handler. No problems here.
>
> The JSON file has metadata: 3 tags: a URL, author and title (for the
> content in the corresponding txt file).
> When I index the JSON file (I just used the _default schema, and posted the
> fields to the schema, as explained in the official solr tutorial),* I don't
> know how to get the file name into the index as a field.* As far as i know,
> that's no way to use the Data import handler for JSON files. I've read that
> I can pass a literal through the bin/post tool, but again, as far as I
> understand, I can't pass in the file name dynamically as a literal.
>
> I NEED to get the file name, it is the only way in which I can associate
> the metadata with each sentence in the txt files in my downstream Python
> code.
>
> So if anybody has a suggestion about how I should index the JSON file name
> along with the JSON content (or even some workaround), I'd be eternally
> grateful.

The indexing tools included with Solr are good for simple use cases. 
They're generic tools with limits.

The bin/post tool calls a class that is literally called
SimplePostTool.  It is never going to have a lot of capability.

The dataimport handler, while certainly capable of far more than the
simple post tool, is somewhat rigid in its operation. 

A sizable percentage of Solr users end up writing their own indexing
software because what's included with Solr isn't capable of adjusting to
their needs.  Your situation sounds like one that is going to require
custom indexing software that you or somebody in your company must write.

Thanks,
Shawn



Re: Index filename while indexing JSON file

2018-05-22 Thread Bernd Fehling

I don't know if DIH can solve your problem but I would go for
a simple self programmed ETL in JAVA and use SolrJ for loading.

Best regards,
Bernd


Am 18.05.2018 um 21:47 schrieb S.Ashwath:

Hello,

I have 2 directories: 1 with txt files and the other with corresponding
JSON (metadata) files (around 9 of each). There is one JSON file for
each CSV file, and they share the same name (they don't share any other
fields).

The txt files just have plain text, I mapped each line to a field call
'sentence' and included the file name as a field using the data import
handler. No problems here.

The JSON file has metadata: 3 tags: a URL, author and title (for the
content in the corresponding txt file).
When I index the JSON file (I just used the _default schema, and posted the
fields to the schema, as explained in the official solr tutorial),* I don't
know how to get the file name into the index as a field.* As far as i know,
that's no way to use the Data import handler for JSON files. I've read that
I can pass a literal through the bin/post tool, but again, as far as I
understand, I can't pass in the file name dynamically as a literal.

I NEED to get the file name, it is the only way in which I can associate
the metadata with each sentence in the txt files in my downstream Python
code.

So if anybody has a suggestion about how I should index the JSON file name
along with the JSON content (or even some workaround), I'd be eternally
grateful.

Regards,

Ash



Re: Index filename while indexing JSON file

2018-05-21 Thread S.Ashwath
Thanks Raymond. As I was doing the indexing of other delimited files
directly with Solr and the terminal (without a client), I thought it would
be possible to index the filename of JSON files this way as well.
But like you say, I'm parsing the search results in Python. So I might as
well build the index through Python as well. I might have to explore
something like Pysolr.

Thanks again!

On 21 May 2018 at 02:49, Raymond Xie  wrote:

> would you consider to include the filename as another meta data fields for
> being indexed? I think your downstream python can do that easily.
>
>
> **
> *Sincerely yours,*
>
>
> *Raymond*
>
> On Fri, May 18, 2018 at 3:47 PM, S.Ashwath  wrote:
>
> > Hello,
> >
> > I have 2 directories: 1 with txt files and the other with corresponding
> > JSON (metadata) files (around 9 of each). There is one JSON file for
> > each CSV file, and they share the same name (they don't share any other
> > fields).
> >
> > The txt files just have plain text, I mapped each line to a field call
> > 'sentence' and included the file name as a field using the data import
> > handler. No problems here.
> >
> > The JSON file has metadata: 3 tags: a URL, author and title (for the
> > content in the corresponding txt file).
> > When I index the JSON file (I just used the _default schema, and posted
> the
> > fields to the schema, as explained in the official solr tutorial),* I
> don't
> > know how to get the file name into the index as a field.* As far as i
> know,
> > that's no way to use the Data import handler for JSON files. I've read
> that
> > I can pass a literal through the bin/post tool, but again, as far as I
> > understand, I can't pass in the file name dynamically as a literal.
> >
> > I NEED to get the file name, it is the only way in which I can associate
> > the metadata with each sentence in the txt files in my downstream Python
> > code.
> >
> > So if anybody has a suggestion about how I should index the JSON file
> name
> > along with the JSON content (or even some workaround), I'd be eternally
> > grateful.
> >
> > Regards,
> >
> > Ash
> >
>


Re: Index filename while indexing JSON file

2018-05-20 Thread Raymond Xie
would you consider to include the filename as another meta data fields for
being indexed? I think your downstream python can do that easily.


**
*Sincerely yours,*


*Raymond*

On Fri, May 18, 2018 at 3:47 PM, S.Ashwath  wrote:

> Hello,
>
> I have 2 directories: 1 with txt files and the other with corresponding
> JSON (metadata) files (around 9 of each). There is one JSON file for
> each CSV file, and they share the same name (they don't share any other
> fields).
>
> The txt files just have plain text, I mapped each line to a field call
> 'sentence' and included the file name as a field using the data import
> handler. No problems here.
>
> The JSON file has metadata: 3 tags: a URL, author and title (for the
> content in the corresponding txt file).
> When I index the JSON file (I just used the _default schema, and posted the
> fields to the schema, as explained in the official solr tutorial),* I don't
> know how to get the file name into the index as a field.* As far as i know,
> that's no way to use the Data import handler for JSON files. I've read that
> I can pass a literal through the bin/post tool, but again, as far as I
> understand, I can't pass in the file name dynamically as a literal.
>
> I NEED to get the file name, it is the only way in which I can associate
> the metadata with each sentence in the txt files in my downstream Python
> code.
>
> So if anybody has a suggestion about how I should index the JSON file name
> along with the JSON content (or even some workaround), I'd be eternally
> grateful.
>
> Regards,
>
> Ash
>


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-19 Thread Alessandro Benedetti
Hi David,
good to know that sorting solved your problem.
I understand perfectly that given the urgency of your situation, having the
solution ready takes priority over continuing with the investigations.

I would recommend anyway to open a Jira issue in Apache Solr with all the
information gathered so far.
Your situation caught our attention and definitely changing the order of the
documents in input shouldn't affect the index size ( by such a greater
factor).
The fact that the optimize didn't change anything is even more suspicious.
It may be an indicator that in some edge cases ordering of input documents
is affecting one of the index data structure.
As a last thing when you have time I would suggest to :

1) index the ordering which gives you a small index - Optimize - Take note
of the size by index file extension

2) index the ordering which gives you a big index - Optimize - Take note of
the size by index file extension

And attach that to the Jira issue.
Whenever someone picks it up, that would definitely help.

Cheers




-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-18 Thread Howe, David

Hi Erick & Alessandro,

I have solved my problem by re-ordering the data in the SQL query.  I don't 
know why it works but it does.  I can consistently re-produce the problem 
without changing anything else except the database table.  As our Solr build is 
scripted and we always build a new Solr server from scratch, I'm pretty 
confident that the defaults haven't changed between test runs as when we create 
the Solr index, Solr doesn't know what order the data in the database table is 
in.

I did try removing the geo location field to see if that made a difference, and 
it didn't.

Due to project commitments, I don't have any time to investigate this further 
at the moment.  When/if things quiet down I may see if I can reproduce the 
problem with a smaller number of records loaded from a flat file to make it 
easier to share a project that shows the problem occurring.

Thanks again for all of your assistance and suggestions.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread Gora Mohanty
On 18 February 2018 at 08:18, @Nandan@ 
wrote:

> Thanks Rick.
> Is it possible to get some demo learning video link or web links from
> where I can get overview with real example?
> By which I can able to know in more details.
>

Searching Google for "Solr index data database" turns up many links with
examples, e.g.,
http://blog.comperiosearch.com/blog/2014/08/28/indexing-database-using-solr/

Regards,
Gora


Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread @Nandan@
Thanks Rick.
Is it possible to get some demo learning video link or web links from
where I can get overview with real example?
By which I can able to know in more details.


On Feb 18, 2018 4:11 AM, "Rick Leir"  wrote:

> Nandan
> Work backwards from your results screen. When a user has done a search,
> what information would you like to appear on the screen?
>
> That tells you what your Solr document needs to contain. How will you get
> that information into the Solr document? You will do the SQL select(s) as
> necessary, get the info from MySQL,  build a flat JSON record containing
> all that info for one document, and POST it to Solr. Repeat for all
> documents. Do a commit. Sorry, I Left out all the details! Cheers -- Rick
>
> On February 17, 2018 12:56:59 PM EST, "@Nandan@" <
> nandanpriyadarshi...@gmail.com> wrote:
> >Hi David ,
> >Thanks for your reply.
> >My few questions are :-
> >1) I have to denormalize my MySQL data manually or some process is
> >there.
> >2) is it like when Data will insert into my MySQL  , it has to auto
> >index
> >into solr ?
> >
> >Please explain these .
> >Thanks
> >
> >On Feb 18, 2018 1:51 AM, "David Hastings"  wrote:
> >
> >> Your first step is to denormalize your data into a flat data
> >structure.
> >> Then index that into your solr instance. Then you’re done
> >>
> >> On Feb 17, 2018, at 12:16 PM, @Nandan@
> > >> > wrote:
> >>
> >> Hi Team,
> >> I am working on one e-commerce project in which my data is storing
> >into
> >> MySQL DB.
> >> As currently we are using mysql search but planning to implement Solr
> >> search to provide our customers more facilities.
> >> Just for development purpose ,I am trying to do experiments into
> >localhost.
> >> Please guide me how can I Achieve it. Please provide some information
> >links
> >> which I can refer to learn more in details from scratch.
> >>
> >> Thanks and Best Regards,
> >> Nandan Priyadarshi
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread Rick Leir
Nandan
Work backwards from your results screen. When a user has done a search, what 
information would you like to appear on the screen?

That tells you what your Solr document needs to contain. How will you get that 
information into the Solr document? You will do the SQL select(s) as necessary, 
get the info from MySQL,  build a flat JSON record containing all that info for 
one document, and POST it to Solr. Repeat for all documents. Do a commit. 
Sorry, I Left out all the details! Cheers -- Rick

On February 17, 2018 12:56:59 PM EST, "@Nandan@" 
 wrote:
>Hi David ,
>Thanks for your reply.
>My few questions are :-
>1) I have to denormalize my MySQL data manually or some process is
>there.
>2) is it like when Data will insert into my MySQL  , it has to auto
>index
>into solr ?
>
>Please explain these .
>Thanks
>
>On Feb 18, 2018 1:51 AM, "David Hastings"  wrote:
>
>> Your first step is to denormalize your data into a flat data
>structure.
>> Then index that into your solr instance. Then you’re done
>>
>> On Feb 17, 2018, at 12:16 PM, @Nandan@
>> > wrote:
>>
>> Hi Team,
>> I am working on one e-commerce project in which my data is storing
>into
>> MySQL DB.
>> As currently we are using mysql search but planning to implement Solr
>> search to provide our customers more facilities.
>> Just for development purpose ,I am trying to do experiments into
>localhost.
>> Please guide me how can I Achieve it. Please provide some information
>links
>> which I can refer to learn more in details from scratch.
>>
>> Thanks and Best Regards,
>> Nandan Priyadarshi
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread @Nandan@
Hi David ,
Thanks for your reply.
My few questions are :-
1) I have to denormalize my MySQL data manually or some process is there.
2) is it like when Data will insert into my MySQL  , it has to auto index
into solr ?

Please explain these .
Thanks

On Feb 18, 2018 1:51 AM, "David Hastings"  wrote:

> Your first step is to denormalize your data into a flat data structure.
> Then index that into your solr instance. Then you’re done
>
> On Feb 17, 2018, at 12:16 PM, @Nandan@  > wrote:
>
> Hi Team,
> I am working on one e-commerce project in which my data is storing into
> MySQL DB.
> As currently we are using mysql search but planning to implement Solr
> search to provide our customers more facilities.
> Just for development purpose ,I am trying to do experiments into localhost.
> Please guide me how can I Achieve it. Please provide some information links
> which I can refer to learn more in details from scratch.
>
> Thanks and Best Regards,
> Nandan Priyadarshi
>


Re: Index data from mysql DB to Solr - From Scratch

2018-02-17 Thread David Hastings
Your first step is to denormalize your data into a flat data structure. Then 
index that into your solr instance. Then you’re done

On Feb 17, 2018, at 12:16 PM, @Nandan@ 
> wrote:

Hi Team,
I am working on one e-commerce project in which my data is storing into
MySQL DB.
As currently we are using mysql search but planning to implement Solr
search to provide our customers more facilities.
Just for development purpose ,I am trying to do experiments into localhost.
Please guide me how can I Achieve it. Please provide some information links
which I can refer to learn more in details from scratch.

Thanks and Best Regards,
Nandan Priyadarshi


Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
I didn't mean to imply that _you'd_ changed things, the _defaults_ may
have changed. So the "string" fieldType may be defined with
docValues="true" in your new schema and "false" in your old schema
without you intentionally changing anything at _all_.

That's why the LukeRequestHandler will help, because it tells you
what's _there_ regardless of how it got there...

Best,
Erick

On Fri, Feb 16, 2018 at 1:37 PM, Howe, David  wrote:
>
> Hi Erick,
>
> I'm 99% sure that I haven't changed the field types between the two snapshots 
> as all of my test runs are completely scripted and build a new Solr server 
> from scratch (both the virtual machine and the Solr software).  I can diff 
> the scripts between two runs to make sure I haven't accidentally changed 
> anything, and I have done this.
>
> The only difference is that I added docValues=false to all of the fields that 
> are indexed=false and stored=true in the run that is smaller.  I had tested 
> this previously with the data in the order that makes the index larger and it 
> only made a minor difference (see one of my previous posts).  Unfortunately, 
> I hadn't added the change to log the file sizes when I did that run, but it 
> definitely didn't fix the problem.
>
> I need to try and get my project back on track now, so I will concentrate on 
> the "fix" that I have and perhaps re-run some other scenarios when I have 
> more time.
>
> Thanks again for your help.
>
> Regards,
>
> David
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> Australia Post is committed to providing our customers with excellent 
> service. If we can assist you in any way please telephone 13 13 18 or visit 
> our website.
>
> The information contained in this email communication may be proprietary, 
> confidential or legally professionally privileged. It is intended exclusively 
> for the individual or entity to which it is addressed. You should only read, 
> disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
> the information if you are authorised to do so. Australia Post does not 
> represent, warrant or guarantee that the integrity of this email 
> communication has been maintained nor that the communication is free of 
> errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by 
> replying direct to the sender and then destroy any electronic or paper copy 
> of this message. Any views expressed in this email communication are taken to 
> be those of the individual sender, except where the sender specifically 
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David

Hi Erick,

I'm 99% sure that I haven't changed the field types between the two snapshots 
as all of my test runs are completely scripted and build a new Solr server from 
scratch (both the virtual machine and the Solr software).  I can diff the 
scripts between two runs to make sure I haven't accidentally changed anything, 
and I have done this.

The only difference is that I added docValues=false to all of the fields that 
are indexed=false and stored=true in the run that is smaller.  I had tested 
this previously with the data in the order that makes the index larger and it 
only made a minor difference (see one of my previous posts).  Unfortunately, I 
hadn't added the change to log the file sizes when I did that run, but it 
definitely didn't fix the problem.

I need to try and get my project back on track now, so I will concentrate on 
the "fix" that I have and perhaps re-run some other scenarios when I have more 
time.

Thanks again for your help.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Erick Erickson
Well, I'm not entirely sure either ;)

What I'm seeing. And, BTW, I'm making a couple of assumptions here. In
the one listing, your biggest segment starts with _7l and in the other
its _zd. The aggregate size is
2,815M for _7l and 705M for _zd. So multiplying the individual files
in _zd by 4 (poor-man's normalization) I get these major differences:

ext_7l(M)_zd(M)
dim84200These are points fields.
fdt1,431  1,000   These are stored data, the discrepancy here
goes the "other" way.
pos 335  480   position information.
dvd 165  400   docValues
tim3480  terms dictionary

I don't think the fdt or pos fields matter all that much, they're
"close enough". That said, I'd guess you have some position
information turned on in the more recent Solr that wasn't in the old
one.

Points and dvd fields are much more interesting, as well as terms dictionary.

I doubt you've consciously changed the field types but some of the
_defaults_ have changed in the fieldType definitions. Perhaps that
accounts for some of the difference? That's why I was curious about
what the LukeRequestHandler (or Luke itself) show's for each field
type. Those tools show you what's actually _in_ the index's metadata,
not just what is in the schema file.

As for the sorting, I'll have to defer to the people who understand
how spatial data is stored

Best,
Erick

On Fri, Feb 16, 2018 at 11:37 AM, Howe, David  wrote:
>
> Hi Erick,
>
> Below is the file listing for when the index is loaded with the table ordered 
> in a way that produces the smaller index.
>
> I have checked the console, and we have no deleted docs and we have the same 
> number of docs in the index as there are rows in the staging table that we 
> load from.  I would be surprised if this wasn't the case as we use the 
> primary key from the staging table as the id in Solr, so it is pretty much 
> guaranteed to be unique.  The primary key in the staging table is a 
> NUMBER(10, 0) column which contains the row number in Oracle, so it starts 
> from 1 and goes up to 14,061,990.  We load the index in row number order.
>
> When we get the larger sized index, the table is sequenced by a field named 
> DPID which is a NUMBER(10, 0) in Oracle.  The corresponding Solr definition 
> for that field is:
>
>   curl -X POST -H 'Content-type:application/json' --data-binary '{
> "add-field":{
>"name":"dpid",
>"type":"pint",
>"stored":true,
>"indexed": true
> }
>   }' http://localhost:8983/solr/address/schema
>
> When we get the smaller sized index, the table is sequenced by locality 
> (VARCHAR2(80)) and then postcode (VARCHAR2(4)).  The corresponding Solr 
> definition for these fields is:
>
>   echo "$(date) Creating locality field"
>   curl -X POST -H 'Content-type:application/json' --data-binary '{
> "add-field":{
>"name":"locality",
>"type":"locality",
>"stored":true,
>"indexed":true
> }
>   }' http://localhost:8983/solr/address/schema
>
>   echo "$(date) Creating postcode field"
>   curl -X POST -H 'Content-type:application/json' --data-binary '{
> "add-field":{
>"name":"postcode",
>"type":"pint",
>"stored":true,
>"indexed":true
> }
>   }' http://localhost:8983/solr/address/schema
>
> Not sure if this helps or not.
>
> Regards,
>
> David
>
> total 5300812
> -rw-r--r-- 1 solr solr97 Feb 16 04:12 _14o.dii
> -rw-r--r-- 1 solr solr  45400325 Feb 16 04:12 _14o.dim
> -rw-r--r-- 1 solr solr 221114041 Feb 16 04:10 _14o.fdt
> -rw-r--r-- 1 solr solr286434 Feb 16 04:10 _14o.fdx
> -rw-r--r-- 1 solr solr  6370 Feb 16 04:12 _14o.fnm
> -rw-r--r-- 1 solr solr  17379224 Feb 16 04:12 _14o.nvd
> -rw-r--r-- 1 solr solr   463 Feb 16 04:12 _14o.nvm
> -rw-r--r-- 1 solr solr   620 Feb 16 04:12 _14o.si
> -rw-r--r-- 1 solr solr 147867580 Feb 16 04:11 _14o_Lucene50_0.doc
> -rw-r--r-- 1 solr solr 111291706 Feb 16 04:11 _14o_Lucene50_0.pos
> -rw-r--r-- 1 solr solr  18793856 Feb 16 04:11 _14o_Lucene50_0.tim
> -rw-r--r-- 1 solr solr360329 Feb 16 04:11 _14o_Lucene50_0.tip
> -rw-r--r-- 1 solr solr  91972283 Feb 16 04:12 _14o_Lucene70_0.dvd
> -rw-r--r-- 1 solr solr  4173 Feb 16 04:12 _14o_Lucene70_0.dvm
> -rw-r--r-- 1 solr solr   405 Feb 16 04:20 _16l.cfe
> -rw-r--r-- 1 solr solr  10956277 Feb 16 04:20 _16l.cfs
> -rw-r--r-- 1 solr solr   455 Feb 16 04:20 _16l.si
> -rw-r--r-- 1 solr solr   405 Feb 16 04:30 _18t.cfe
> -rw-r--r-- 1 solr solr  11619394 Feb 16 04:30 _18t.cfs
> -rw-r--r-- 1 solr solr   455 Feb 16 04:30 _18t.si
> -rw-r--r-- 1 solr solr97 Feb 16 04:34 _19e.dii
> -rw-r--r-- 1 solr solr  39424990 Feb 16 04:34 _19e.dim
> -rw-r--r-- 1 solr solr 188005197 Feb 16 04:33 _19e.fdt
> -rw-r--r-- 1 solr solr249160 Feb 16 04:33 _19e.fdx
> -rw-r--r-- 1 solr solr  6370 Feb 16 04:34 _19e.fnm
> -rw-r--r-- 1 solr solr  14660427 Feb 16 04:34 

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David

Hi Erick,

Thinking some more about the differences between the two sort orders has 
suggested another possibility.  We also have a geo spatial field defined in the 
index:

  echo "$(date) Creating geoLocation field"
  curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
   "name":"geoLocation",
   "type":"location",
   "stored":true,
   "indexed":true
}
  }' http://localhost:8983/solr/address/schema

One of the differences between the two sort orders is that when the data is 
sorted by locality and post code, it means that addresses that are close to 
each other will be sorted together as both locality and postcode have 
geographic meaning.  So when they are indexed, they will be indexed in groups 
of addresses that are quite near to each other.

When the data is sorted by DPID, the order is near random as the dpid has no 
meaning at all, so the geo location sequence should be random as well.

I don't have time to test this at the moment, as I need to get my project back 
on track after chasing this performance issue but it might ring a bell with 
somebody.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David

Hi Erick,

Below is the file listing for when the index is loaded with the table ordered 
in a way that produces the smaller index.

I have checked the console, and we have no deleted docs and we have the same 
number of docs in the index as there are rows in the staging table that we load 
from.  I would be surprised if this wasn't the case as we use the primary key 
from the staging table as the id in Solr, so it is pretty much guaranteed to be 
unique.  The primary key in the staging table is a NUMBER(10, 0) column which 
contains the row number in Oracle, so it starts from 1 and goes up to 
14,061,990.  We load the index in row number order.

When we get the larger sized index, the table is sequenced by a field named 
DPID which is a NUMBER(10, 0) in Oracle.  The corresponding Solr definition for 
that field is:

  curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
   "name":"dpid",
   "type":"pint",
   "stored":true,
   "indexed": true
}
  }' http://localhost:8983/solr/address/schema

When we get the smaller sized index, the table is sequenced by locality 
(VARCHAR2(80)) and then postcode (VARCHAR2(4)).  The corresponding Solr 
definition for these fields is:

  echo "$(date) Creating locality field"
  curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
   "name":"locality",
   "type":"locality",
   "stored":true,
   "indexed":true
}
  }' http://localhost:8983/solr/address/schema

  echo "$(date) Creating postcode field"
  curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field":{
   "name":"postcode",
   "type":"pint",
   "stored":true,
   "indexed":true
}
  }' http://localhost:8983/solr/address/schema

Not sure if this helps or not.

Regards,

David

total 5300812
-rw-r--r-- 1 solr solr97 Feb 16 04:12 _14o.dii
-rw-r--r-- 1 solr solr  45400325 Feb 16 04:12 _14o.dim
-rw-r--r-- 1 solr solr 221114041 Feb 16 04:10 _14o.fdt
-rw-r--r-- 1 solr solr286434 Feb 16 04:10 _14o.fdx
-rw-r--r-- 1 solr solr  6370 Feb 16 04:12 _14o.fnm
-rw-r--r-- 1 solr solr  17379224 Feb 16 04:12 _14o.nvd
-rw-r--r-- 1 solr solr   463 Feb 16 04:12 _14o.nvm
-rw-r--r-- 1 solr solr   620 Feb 16 04:12 _14o.si
-rw-r--r-- 1 solr solr 147867580 Feb 16 04:11 _14o_Lucene50_0.doc
-rw-r--r-- 1 solr solr 111291706 Feb 16 04:11 _14o_Lucene50_0.pos
-rw-r--r-- 1 solr solr  18793856 Feb 16 04:11 _14o_Lucene50_0.tim
-rw-r--r-- 1 solr solr360329 Feb 16 04:11 _14o_Lucene50_0.tip
-rw-r--r-- 1 solr solr  91972283 Feb 16 04:12 _14o_Lucene70_0.dvd
-rw-r--r-- 1 solr solr  4173 Feb 16 04:12 _14o_Lucene70_0.dvm
-rw-r--r-- 1 solr solr   405 Feb 16 04:20 _16l.cfe
-rw-r--r-- 1 solr solr  10956277 Feb 16 04:20 _16l.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:20 _16l.si
-rw-r--r-- 1 solr solr   405 Feb 16 04:30 _18t.cfe
-rw-r--r-- 1 solr solr  11619394 Feb 16 04:30 _18t.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:30 _18t.si
-rw-r--r-- 1 solr solr97 Feb 16 04:34 _19e.dii
-rw-r--r-- 1 solr solr  39424990 Feb 16 04:34 _19e.dim
-rw-r--r-- 1 solr solr 188005197 Feb 16 04:33 _19e.fdt
-rw-r--r-- 1 solr solr249160 Feb 16 04:33 _19e.fdx
-rw-r--r-- 1 solr solr  6370 Feb 16 04:34 _19e.fnm
-rw-r--r-- 1 solr solr  14660427 Feb 16 04:34 _19e.nvd
-rw-r--r-- 1 solr solr   463 Feb 16 04:34 _19e.nvm
-rw-r--r-- 1 solr solr   620 Feb 16 04:34 _19e.si
-rw-r--r-- 1 solr solr 131101691 Feb 16 04:33 _19e_Lucene50_0.doc
-rw-r--r-- 1 solr solr  97734855 Feb 16 04:33 _19e_Lucene50_0.pos
-rw-r--r-- 1 solr solr  16502289 Feb 16 04:33 _19e_Lucene50_0.tim
-rw-r--r-- 1 solr solr320224 Feb 16 04:33 _19e_Lucene50_0.tip
-rw-r--r-- 1 solr solr  78801516 Feb 16 04:34 _19e_Lucene70_0.dvd
-rw-r--r-- 1 solr solr  2097 Feb 16 04:34 _19e_Lucene70_0.dvm
-rw-r--r-- 1 solr solr   405 Feb 16 04:35 _19y.cfe
-rw-r--r-- 1 solr solr  78051374 Feb 16 04:35 _19y.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:35 _19y.si
-rw-r--r-- 1 solr solr   405 Feb 16 04:37 _1ai.cfe
-rw-r--r-- 1 solr solr  53311170 Feb 16 04:37 _1ai.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:37 _1ai.si
-rw-r--r-- 1 solr solr   405 Feb 16 04:40 _1b2.cfe
-rw-r--r-- 1 solr solr  70986259 Feb 16 04:40 _1b2.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:40 _1b2.si
-rw-r--r-- 1 solr solr   405 Feb 16 04:41 _1bc.cfe
-rw-r--r-- 1 solr solr  10338200 Feb 16 04:41 _1bc.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:41 _1bc.si
-rw-r--r-- 1 solr solr   405 Feb 16 04:42 _1bm.cfe
-rw-r--r-- 1 solr solr  68074070 Feb 16 04:42 _1bm.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:42 _1bm.si
-rw-r--r-- 1 solr solr   405 Feb 16 04:45 _1c5.cfe
-rw-r--r-- 1 solr solr  67766868 Feb 16 04:45 _1c5.cfs
-rw-r--r-- 1 solr solr   455 Feb 16 04:45 _1c5.si
-rw-r--r-- 1 solr solr91 Feb 16 04:45 _1c6.dii
-rw-r--r-- 1 solr solr666032 Feb 16 04:45 _1c6.dim
-rw-r--r-- 1 solr solr   2515129 Feb 16 04:45 _1c6.fdt

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David

Hi Alessandro,

There are 14,061,990 records in the staging table and that is how many 
documents that we end up with in Solr.  I would be surprised if we have a 
problem with the id, as we use the primary key of the table as the id in Solr 
so it must be unique.

The primary key of the staging table is a NUMBER(10, 0) in Oracle, and we set 
it to the row number when we are populating the table.  So the id's will start 
at 1 and go up to 14,061,990 and we load the records in id order.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Alessandro Benedetti
It's a silly thing, but to confirm the direction that Erick is suggesting :
How many rows in the DB ?
If updates are happening on Solr ( causing the deletes), I would expect a
greater number of documents in the DB than in the Solr index.
Is the DB primary key ( if any) the same of the uniqueKey field in Solr ?

Regards

--
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
www.sease.io

On Fri, Feb 16, 2018 at 10:18 AM, Howe, David 
wrote:

>
> Hi Emir,
>
> We have no copy field definitions.  To keep things simple, we have a one
> to one mapping between the columns in our staging table and the fields in
> our Solr index.
>
> Regards,
>
> David
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or visit
> our website.
>
> The information contained in this email communication may be proprietary,
> confidential or legally professionally privileged. It is intended
> exclusively for the individual or entity to which it is addressed. You
> should only read, disclose, re-transmit, copy, distribute, act in reliance
> on or commercialise the information if you are authorised to do so.
> Australia Post does not represent, warrant or guarantee that the integrity
> of this email communication has been maintained nor that the communication
> is free of errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper copy
> of this message. Any views expressed in this email communication are taken
> to be those of the individual sender, except where the sender specifically
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.
>


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Howe, David

Hi Emir,

We have no copy field definitions.  To keep things simple, we have a one to one 
mapping between the columns in our staging table and the fields in our Solr 
index.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-16 Thread Emir Arnautović
Hi David,
I skimmed through thread and don’t see if already eliminated, so will ask: Can 
you check if there are some copyField rules that are triggered when new field 
is added. You mentioned that ordering fixed the size of the index, but might be 
worth checking.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 16 Feb 2018, at 05:05, Erick Erickson  wrote:
> 
> This isn't terribly useful without a similar dump of "the other" index
> directory. The point is to compare the different extensions some
> segment where the sum of all the files in that segment is roughly
> equal. So if you have a listing of the old index around, that would
> help.
> 
> bq: We don't have any deleted docs in our index, as we always build it
> from a brand new virtual machine with a brand new installation of
> Solr.
> 
> Well, that's an assumption I want to check. Here's the problem. It's
> possible that the ordering bit you're talking about is really masking
> indexing the same  multiple times. Since indexing a doc
> with the same  just marks the old doc as deleted, the old
> doc will take up room in your index until it's purged during segment
> merging. This is a _really_ long shot mind you, I have a hard time
> believing that this is the root cause here. It's worth checking
> though. Even doing a q=*:* won't help since that doesn't count deleted
> docs. Take a quick glance at the admin overview page for a core and
> check, there is "maxDoc", "deletedDocs" and "numDocs". I expect
> deletedDocs will be zero and numDocs and maxDoc will be your 14M, but
> this problem is so odd that I'm covering as many  bases as I can think
> of ;)
> 
> Now, ordering may appear to change things, but that could simply be
> that the deleted docs don't happen to fall in segments that are
> merged. Again, this is unlikely but possible.
> 
> The shortcut here would be to optimize afterwards. In the usual course
> of events this should _not_ be necessary (or even desirable) unless
> you do it every time you build your index for arcane reasons, see:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
> But if you do optimize (forceMerge) and the size drops back to more
> reasonable levels it would be a clue.
> 
> Ordering simply should not affect the final index size except for,
> possibly, changing the number of deleted docs in the index largely
> through chance. If you do see a dramatic difference, try the optimize
> thing to check.
> 
> If simple ordering _does_ really make a difference (outside of number
> of deleted docs)  my understanding of Solr is going to undergo a
> revision. And we'll probably be raising a JIRA or two ;)
> 
> Now, what I really expect the issue is is one of two things:
> 1> you have some options turned on now that weren't before, either
> through some innocent-seeming change, a change in the internal
> defaults etc.
> 2> your SQL with the extra field is behaving unexpectedly.
> 
> The proof is of course in the pudding...
> 
> Best,
> Erick
> 
> 
> 
> On Thu, Feb 15, 2018 at 5:15 PM, Howe, David  
> wrote:
>> 
>> Hi Erick,
>> 
>> I have the full dump of the Solr index file sizes as well if that is of any 
>> help.  I have attached it below this message.
>> 
>> We don't have any deleted docs in our index, as we always build it from a 
>> brand new virtual machine with a brand new installation of Solr.
>> 
>> The ordering is definitely making a difference, as I can run the same 
>> indexing configuration over a table with the same data just in different 
>> orders and it produces these vastly different results.  I have been chasing 
>> this for a couple of weeks trying to work out what the difference is when we 
>> just add one extra field.  The difference that I have found is that the 
>> extra field causes the staging table population query to be optimised 
>> differently and to select the records in a different sequence.  When I force 
>> the records back to their original sequence, the index goes back to being 
>> small again.
>> 
>> I'm currently re-building my staging data to try and get it into the same 
>> order as before and including the extra field.  I will post the file sizes 
>> again when I have that result.
>> 
>> Regards,
>> 
>> David
>> 
>> total 14600404
>> -rw-r--r-- 1 solr solr 97 Feb 14 01:34 _7l.dii
>> -rw-r--r-- 1 solr solr   83831801 Feb 14 01:34 _7l.dim
>> -rw-r--r-- 1 solr solr 1431645451 Feb 14 01:33 _7l.fdt
>> -rw-r--r-- 1 solr solr 381994 Feb 14 01:33 _7l.fdx
>> -rw-r--r-- 1 solr solr   6370 Feb 14 01:34 _7l.fnm
>> -rw-r--r-- 1 solr solr   29353048 Feb 14 01:34 _7l.nvd
>> -rw-r--r-- 1 solr solr463 Feb 14 01:34 _7l.nvm
>> -rw-r--r-- 1 solr solr606 Feb 14 01:34 _7l.si
>> -rw-r--r-- 1 solr solr  734701117 Feb 14 01:34 _7l_Lucene50_0.doc
>> -rw-r--r-- 1 solr solr  335043096 Feb 14 01:34 

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
This isn't terribly useful without a similar dump of "the other" index
directory. The point is to compare the different extensions some
segment where the sum of all the files in that segment is roughly
equal. So if you have a listing of the old index around, that would
help.

bq: We don't have any deleted docs in our index, as we always build it
from a brand new virtual machine with a brand new installation of
Solr.

Well, that's an assumption I want to check. Here's the problem. It's
possible that the ordering bit you're talking about is really masking
indexing the same  multiple times. Since indexing a doc
with the same  just marks the old doc as deleted, the old
doc will take up room in your index until it's purged during segment
merging. This is a _really_ long shot mind you, I have a hard time
believing that this is the root cause here. It's worth checking
though. Even doing a q=*:* won't help since that doesn't count deleted
docs. Take a quick glance at the admin overview page for a core and
check, there is "maxDoc", "deletedDocs" and "numDocs". I expect
deletedDocs will be zero and numDocs and maxDoc will be your 14M, but
this problem is so odd that I'm covering as many  bases as I can think
of ;)

Now, ordering may appear to change things, but that could simply be
that the deleted docs don't happen to fall in segments that are
merged. Again, this is unlikely but possible.

The shortcut here would be to optimize afterwards. In the usual course
of events this should _not_ be necessary (or even desirable) unless
you do it every time you build your index for arcane reasons, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/.
But if you do optimize (forceMerge) and the size drops back to more
reasonable levels it would be a clue.

Ordering simply should not affect the final index size except for,
possibly, changing the number of deleted docs in the index largely
through chance. If you do see a dramatic difference, try the optimize
thing to check.

If simple ordering _does_ really make a difference (outside of number
of deleted docs)  my understanding of Solr is going to undergo a
revision. And we'll probably be raising a JIRA or two ;)

Now, what I really expect the issue is is one of two things:
1> you have some options turned on now that weren't before, either
through some innocent-seeming change, a change in the internal
defaults etc.
2> your SQL with the extra field is behaving unexpectedly.

The proof is of course in the pudding...

Best,
Erick



On Thu, Feb 15, 2018 at 5:15 PM, Howe, David  wrote:
>
> Hi Erick,
>
> I have the full dump of the Solr index file sizes as well if that is of any 
> help.  I have attached it below this message.
>
> We don't have any deleted docs in our index, as we always build it from a 
> brand new virtual machine with a brand new installation of Solr.
>
> The ordering is definitely making a difference, as I can run the same 
> indexing configuration over a table with the same data just in different 
> orders and it produces these vastly different results.  I have been chasing 
> this for a couple of weeks trying to work out what the difference is when we 
> just add one extra field.  The difference that I have found is that the extra 
> field causes the staging table population query to be optimised differently 
> and to select the records in a different sequence.  When I force the records 
> back to their original sequence, the index goes back to being small again.
>
> I'm currently re-building my staging data to try and get it into the same 
> order as before and including the extra field.  I will post the file sizes 
> again when I have that result.
>
> Regards,
>
> David
>
> total 14600404
> -rw-r--r-- 1 solr solr 97 Feb 14 01:34 _7l.dii
> -rw-r--r-- 1 solr solr   83831801 Feb 14 01:34 _7l.dim
> -rw-r--r-- 1 solr solr 1431645451 Feb 14 01:33 _7l.fdt
> -rw-r--r-- 1 solr solr 381994 Feb 14 01:33 _7l.fdx
> -rw-r--r-- 1 solr solr   6370 Feb 14 01:34 _7l.fnm
> -rw-r--r-- 1 solr solr   29353048 Feb 14 01:34 _7l.nvd
> -rw-r--r-- 1 solr solr463 Feb 14 01:34 _7l.nvm
> -rw-r--r-- 1 solr solr606 Feb 14 01:34 _7l.si
> -rw-r--r-- 1 solr solr  734701117 Feb 14 01:34 _7l_Lucene50_0.doc
> -rw-r--r-- 1 solr solr  335043096 Feb 14 01:34 _7l_Lucene50_0.pos
> -rw-r--r-- 1 solr solr   34248274 Feb 14 01:34 _7l_Lucene50_0.tim
> -rw-r--r-- 1 solr solr 624945 Feb 14 01:34 _7l_Lucene50_0.tip
> -rw-r--r-- 1 solr solr  165958502 Feb 14 01:34 _7l_Lucene70_0.dvd
> -rw-r--r-- 1 solr solr   2581 Feb 14 01:34 _7l_Lucene70_0.dvm
> -rw-r--r-- 1 solr solr405 Feb 14 01:46 _9p.cfe
> -rw-r--r-- 1 solr solr   38776749 Feb 14 01:46 _9p.cfs
> -rw-r--r-- 1 solr solr452 Feb 14 01:46 _9p.si
> -rw-r--r-- 1 solr solr 97 Feb 14 02:07 _cm.dii
> -rw-r--r-- 1 solr solr   83111509 Feb 14 02:07 _cm.dim
> -rw-r--r-- 1 solr solr 1419981112 Feb 14 02:02 _cm.fdt
> -rw-r--r-- 1 solr solr   

RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David

Hi Erick,

I have the full dump of the Solr index file sizes as well if that is of any 
help.  I have attached it below this message.

We don't have any deleted docs in our index, as we always build it from a brand 
new virtual machine with a brand new installation of Solr.

The ordering is definitely making a difference, as I can run the same indexing 
configuration over a table with the same data just in different orders and it 
produces these vastly different results.  I have been chasing this for a couple 
of weeks trying to work out what the difference is when we just add one extra 
field.  The difference that I have found is that the extra field causes the 
staging table population query to be optimised differently and to select the 
records in a different sequence.  When I force the records back to their 
original sequence, the index goes back to being small again.

I'm currently re-building my staging data to try and get it into the same order 
as before and including the extra field.  I will post the file sizes again when 
I have that result.

Regards,

David

total 14600404
-rw-r--r-- 1 solr solr 97 Feb 14 01:34 _7l.dii
-rw-r--r-- 1 solr solr   83831801 Feb 14 01:34 _7l.dim
-rw-r--r-- 1 solr solr 1431645451 Feb 14 01:33 _7l.fdt
-rw-r--r-- 1 solr solr 381994 Feb 14 01:33 _7l.fdx
-rw-r--r-- 1 solr solr   6370 Feb 14 01:34 _7l.fnm
-rw-r--r-- 1 solr solr   29353048 Feb 14 01:34 _7l.nvd
-rw-r--r-- 1 solr solr463 Feb 14 01:34 _7l.nvm
-rw-r--r-- 1 solr solr606 Feb 14 01:34 _7l.si
-rw-r--r-- 1 solr solr  734701117 Feb 14 01:34 _7l_Lucene50_0.doc
-rw-r--r-- 1 solr solr  335043096 Feb 14 01:34 _7l_Lucene50_0.pos
-rw-r--r-- 1 solr solr   34248274 Feb 14 01:34 _7l_Lucene50_0.tim
-rw-r--r-- 1 solr solr 624945 Feb 14 01:34 _7l_Lucene50_0.tip
-rw-r--r-- 1 solr solr  165958502 Feb 14 01:34 _7l_Lucene70_0.dvd
-rw-r--r-- 1 solr solr   2581 Feb 14 01:34 _7l_Lucene70_0.dvm
-rw-r--r-- 1 solr solr405 Feb 14 01:46 _9p.cfe
-rw-r--r-- 1 solr solr   38776749 Feb 14 01:46 _9p.cfs
-rw-r--r-- 1 solr solr452 Feb 14 01:46 _9p.si
-rw-r--r-- 1 solr solr 97 Feb 14 02:07 _cm.dii
-rw-r--r-- 1 solr solr   83111509 Feb 14 02:07 _cm.dim
-rw-r--r-- 1 solr solr 1419981112 Feb 14 02:02 _cm.fdt
-rw-r--r-- 1 solr solr 379544 Feb 14 02:02 _cm.fdx
-rw-r--r-- 1 solr solr   6370 Feb 14 02:07 _cm.fnm
-rw-r--r-- 1 solr solr   29049434 Feb 14 02:07 _cm.nvd
-rw-r--r-- 1 solr solr463 Feb 14 02:07 _cm.nvm
-rw-r--r-- 1 solr solr606 Feb 14 02:07 _cm.si
-rw-r--r-- 1 solr solr  728509370 Feb 14 02:07 _cm_Lucene50_0.doc
-rw-r--r-- 1 solr solr  332343997 Feb 14 02:07 _cm_Lucene50_0.pos
-rw-r--r-- 1 solr solr   34361884 Feb 14 02:07 _cm_Lucene50_0.tim
-rw-r--r-- 1 solr solr 658404 Feb 14 02:07 _cm_Lucene50_0.tip
-rw-r--r-- 1 solr solr  164612509 Feb 14 02:07 _cm_Lucene70_0.dvd
-rw-r--r-- 1 solr solr   2581 Feb 14 02:07 _cm_Lucene70_0.dvm
-rw-r--r-- 1 solr solr405 Feb 14 02:09 _fb.cfe
-rw-r--r-- 1 solr solr   44333425 Feb 14 02:09 _fb.cfs
-rw-r--r-- 1 solr solr452 Feb 14 02:09 _fb.si
-rw-r--r-- 1 solr solr 97 Feb 14 02:24 _h2.dii
-rw-r--r-- 1 solr solr   77079684 Feb 14 02:24 _h2.dim
-rw-r--r-- 1 solr solr 1304390074 Feb 14 02:22 _h2.fdt
-rw-r--r-- 1 solr solr 347494 Feb 14 02:22 _h2.fdx
-rw-r--r-- 1 solr solr   6370 Feb 14 02:24 _h2.fnm
-rw-r--r-- 1 solr solr   26756876 Feb 14 02:24 _h2.nvd
-rw-r--r-- 1 solr solr463 Feb 14 02:24 _h2.nvm
-rw-r--r-- 1 solr solr606 Feb 14 02:24 _h2.si
-rw-r--r-- 1 solr solr  669875920 Feb 14 02:24 _h2_Lucene50_0.doc
-rw-r--r-- 1 solr solr  305954906 Feb 14 02:24 _h2_Lucene50_0.pos
-rw-r--r-- 1 solr solr   32019733 Feb 14 02:24 _h2_Lucene50_0.tim
-rw-r--r-- 1 solr solr 619562 Feb 14 02:24 _h2_Lucene50_0.tip
-rw-r--r-- 1 solr solr  151772808 Feb 14 02:24 _h2_Lucene70_0.dvd
-rw-r--r-- 1 solr solr   2497 Feb 14 02:24 _h2_Lucene70_0.dvm
-rw-r--r-- 1 solr solr405 Feb 14 02:45 _mx.cfe
-rw-r--r-- 1 solr solr  277937779 Feb 14 02:45 _mx.cfs
-rw-r--r-- 1 solr solr452 Feb 14 02:45 _mx.si
-rw-r--r-- 1 solr solr 97 Feb 14 02:47 _n9.dii
-rw-r--r-- 1 solr solr   82335510 Feb 14 02:47 _n9.dim
-rw-r--r-- 1 solr solr 1400595065 Feb 14 02:46 _n9.fdt
-rw-r--r-- 1 solr solr 374259 Feb 14 02:46 _n9.fdx
-rw-r--r-- 1 solr solr   6370 Feb 14 02:47 _n9.fnm
-rw-r--r-- 1 solr solr   28775974 Feb 14 02:47 _n9.nvd
-rw-r--r-- 1 solr solr463 Feb 14 02:47 _n9.nvm
-rw-r--r-- 1 solr solr606 Feb 14 02:47 _n9.si
-rw-r--r-- 1 solr solr  719183309 Feb 14 02:46 _n9_Lucene50_0.doc
-rw-r--r-- 1 solr solr  328214265 Feb 14 02:46 _n9_Lucene50_0.pos
-rw-r--r-- 1 solr solr   34098919 Feb 14 02:46 _n9_Lucene50_0.tim
-rw-r--r-- 1 solr solr 654313 Feb 14 02:46 _n9_Lucene50_0.tip
-rw-r--r-- 1 solr solr  163220960 Feb 14 02:46 _n9_Lucene70_0.dvd
-rw-r--r-- 1 solr solr   2560 Feb 14 02:46 _n9_Lucene70_0.dvm
-rw-r--r-- 1 solr solr405 Feb 14 02:52 _ns.cfe

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Erick Erickson
David:

Rats, the cfs files make everything I'd hoped to understand with the
sizes ambiguous, since they conceal the underlying sizes of each other
extension. We can approach it a bit differently though. Take one
segment that's _not_ in cfs format where the total size of all files
making up that segment is near 5GB (the default max segment size) and
compare the individual segments for that segment only. What I'm hoping
to find out, of course, is which extensions vary dramatically. But
let's assume for the nonce that the numbers you already have are
comparable if we ignore the .cfs files.

.doc1094.682767.53 - term frequencies.
.fdt 1633.21 5387.92 - stored data
.pos809.23  1272.70 - position information

So the file difference (if borne out) indicates the following

- doc you have more documents or more terms or different options on
your terms [1]
- fdt you're storing more fields than you used to. [1]
- pos you have more docs or more terms or have position information
turned on where you didn't before. [1]

[1] or lots of deleted docs that haven't been merged away. This
information should be on the admin page for any particular core. I
think this unlikely, but who knows? NOTE, just because you get 14M fro
querying *:* does _not_ say anything about the deleted docs, which
take up space. This is highly unlikely to be your problem, but let's
eliminate the easy stuff ;)

Where I'd go from here after checking that these ratios are true for a
single like-sized segment in both cases

1> the LukeReqeustHandler can tell you information about exactly how
the index is defined, and using Luke itself can provide you a much
more detailed look at what's actually _in_ your index. You could also
have Luke reconstruct the same doc from your index in each case and
compare. Perhaps your SQL is doing something really unexpected. This
_should_ show you the realized meta-data for each field and let you
pinpoint any different options that have been enabled.

2> compare your Oracle intermediate tables, are they _really_
identical? The ordering shouldn't make any difference at all to Solr
assuming the same docs are being indexed (plus any expected delta).
There's an edge case I can imagine if you hit a "perfect storm" and
one version has a lot more deleted docs than the other that's possibly
the result of reordering, but that's unlikely. The edge case I'm
imagining would be easily verifiable by the two versions having a
radically different number of deleted docs

Best,
Erick




On Thu, Feb 15, 2018 at 7:13 AM, Pratik Patel  wrote:
> @Alessandro I will see if I can reproduce the same issue just by turning
> off omitNorms on field type. I'll open another mail thread if required.
> Thanks.
>
> On Thu, Feb 15, 2018 at 6:12 AM, Howe, David 
> wrote:
>
>>
>> Hi Alessandro,
>>
>> Some interesting testing today that seems to have gotten me closer to what
>> the issue is.  When I run the version of the index that is working
>> correctly against my database table that has the extra field in it, the
>> index suddenly increases in size.  This is even though the data importer is
>> running the same SELECT as before (which doesn't include the extra column)
>> and loads the same number of rows.
>>
>> After scratching my head for a bit and browsing through both versions of
>> the table I am loading from (with and without the extra field), I noticed
>> that the natural ordering of the tables is different.  These tables are
>> "staging" tables that I populate with another set of queries and inserts to
>> get the data into a format that is easy to ingest into Solr.  When I add
>> the extra field to these queries, it changes the Oracle query plan as the
>> field is contained in a different table that I need to join to.  As I don't
>> specify an "ORDER BY" on the query (as I didn't think it would make a
>> difference and would slow the query down), Oracle is free to chose how it
>> orders the result set.  Adding the extra field changes that natural
>> ordering, which affects the order things go into my staging table.  As I
>> don't specify an "ORDER BY" when I select things out of the staging table,
>> my data in the scenario that is working is being loaded in a different
>> order to the scenario which doesn't work.
>>
>> I am currently running full loads to verify this under each scenario, as I
>> have now forced the data in the scenario that doesn't work to be in the
>> same order as the scenario that does.  Will see how this load goes
>> overnight.
>>
>> This leads to the question of what difference does it make to Solr what
>> order I load the data in?
>>
>> I also noticed that the .cfs file is quite large in the second scenario,
>> even though this is supposed to be disabled by default in Solr.  I checked
>> my Solr config and there is no override of the default.
>>
>> In answer to your questions:
>>
>> 1) same number of documents - YES ~14,000,000 

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Pratik Patel
@Alessandro I will see if I can reproduce the same issue just by turning
off omitNorms on field type. I'll open another mail thread if required.
Thanks.

On Thu, Feb 15, 2018 at 6:12 AM, Howe, David 
wrote:

>
> Hi Alessandro,
>
> Some interesting testing today that seems to have gotten me closer to what
> the issue is.  When I run the version of the index that is working
> correctly against my database table that has the extra field in it, the
> index suddenly increases in size.  This is even though the data importer is
> running the same SELECT as before (which doesn't include the extra column)
> and loads the same number of rows.
>
> After scratching my head for a bit and browsing through both versions of
> the table I am loading from (with and without the extra field), I noticed
> that the natural ordering of the tables is different.  These tables are
> "staging" tables that I populate with another set of queries and inserts to
> get the data into a format that is easy to ingest into Solr.  When I add
> the extra field to these queries, it changes the Oracle query plan as the
> field is contained in a different table that I need to join to.  As I don't
> specify an "ORDER BY" on the query (as I didn't think it would make a
> difference and would slow the query down), Oracle is free to chose how it
> orders the result set.  Adding the extra field changes that natural
> ordering, which affects the order things go into my staging table.  As I
> don't specify an "ORDER BY" when I select things out of the staging table,
> my data in the scenario that is working is being loaded in a different
> order to the scenario which doesn't work.
>
> I am currently running full loads to verify this under each scenario, as I
> have now forced the data in the scenario that doesn't work to be in the
> same order as the scenario that does.  Will see how this load goes
> overnight.
>
> This leads to the question of what difference does it make to Solr what
> order I load the data in?
>
> I also noticed that the .cfs file is quite large in the second scenario,
> even though this is supposed to be disabled by default in Solr.  I checked
> my Solr config and there is no override of the default.
>
> In answer to your questions:
>
> 1) same number of documents - YES ~14,000,000 documents
> 2) identical documents ( + 1 new field each not indexed) - YES, the second
> scenario has one extra field that is stored but not indexed
> 3) same number of deleted documents - YES, there are zero deleted
> documents in both scenarios
> 4) they both were born from scratch ( an empty index) - YES, both start
> from a brand new virtual server with a brand new installation of Solr
>
> I am using the default auto commit, which I think is 15000.
>
> Thanks again for your assistance.
>
> Regards,
>
> David
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or visit
> our website.
>
> The information contained in this email communication may be proprietary,
> confidential or legally professionally privileged. It is intended
> exclusively for the individual or entity to which it is addressed. You
> should only read, disclose, re-transmit, copy, distribute, act in reliance
> on or commercialise the information if you are authorised to do so.
> Australia Post does not represent, warrant or guarantee that the integrity
> of this email communication has been maintained nor that the communication
> is free of errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper copy
> of this message. Any views expressed in this email communication are taken
> to be those of the individual sender, except where the sender specifically
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.
>


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Howe, David

Hi Alessandro,

Some interesting testing today that seems to have gotten me closer to what the 
issue is.  When I run the version of the index that is working correctly 
against my database table that has the extra field in it, the index suddenly 
increases in size.  This is even though the data importer is running the same 
SELECT as before (which doesn't include the extra column) and loads the same 
number of rows.

After scratching my head for a bit and browsing through both versions of the 
table I am loading from (with and without the extra field), I noticed that the 
natural ordering of the tables is different.  These tables are "staging" tables 
that I populate with another set of queries and inserts to get the data into a 
format that is easy to ingest into Solr.  When I add the extra field to these 
queries, it changes the Oracle query plan as the field is contained in a 
different table that I need to join to.  As I don't specify an "ORDER BY" on 
the query (as I didn't think it would make a difference and would slow the 
query down), Oracle is free to chose how it orders the result set.  Adding the 
extra field changes that natural ordering, which affects the order things go 
into my staging table.  As I don't specify an "ORDER BY" when I select things 
out of the staging table, my data in the scenario that is working is being 
loaded in a different order to the scenario which doesn't work.

I am currently running full loads to verify this under each scenario, as I have 
now forced the data in the scenario that doesn't work to be in the same order 
as the scenario that does.  Will see how this load goes overnight.

This leads to the question of what difference does it make to Solr what order I 
load the data in?

I also noticed that the .cfs file is quite large in the second scenario, even 
though this is supposed to be disabled by default in Solr.  I checked my Solr 
config and there is no override of the default.

In answer to your questions:

1) same number of documents - YES ~14,000,000 documents
2) identical documents ( + 1 new field each not indexed) - YES, the second 
scenario has one extra field that is stored but not indexed
3) same number of deleted documents - YES, there are zero deleted documents in 
both scenarios
4) they both were born from scratch ( an empty index) - YES, both start from a 
brand new virtual server with a brand new installation of Solr

I am using the default auto commit, which I think is 15000.

Thanks again for your assistance.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-15 Thread Alessandro Benedetti
@Pratik: you should have investigated. I understand that solved your issue,
but in case you needed norms it doesn't make sense that cause your index to
grow up by a factor of 30. You must have faced a nasty bug if it was just
the norms.

@Howe : 

*Compound File* .cfs, .cfe  An optional "virtual" file consisting of all the
other index files for systems that frequently run out of file handles.

*Frequencies*   .docContains the list of docs which contain each term along
with frequency

*Field Data*.fdtThe stored fields for documents

*Positions* .posStores position information about where a term occurs in
the index

*Term Index*.tipThe index into the Term Dictionary

So, David, you confirm that those two index have :

1) same number of documents
2) identical documents ( + 1 new field each not indexed)
3) same number of deleted documents
4) they both were born from scratch ( an empty index)

The matter is still suspicious :
- Cfs seems to highlight some sort of malfunctioning during
indexing/committing in relation with the OS. What was the way of commiting
you were using ?

- .doc, .pos, .tip -> they shouldn't change, assuming both the indexes are
optimised, you are adding a not indexed field, those data structures
shouldn't be affected

- the stored content as well, too much of an increment 

Can you send us the full configuration for the new field ?
You don't want, norms, positions and frequencies for it.
But in case they are the issue, you may have found some very edge case,
because also enabling all of them you shouldn't incur in such a penalty for
just an additional tiny field



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Howe, David

I have re-run both scenarios and captured the total size of each type of index 
file.  The MB (1) column is for the baseline scenario which has the smaller 
index and acceptable performance.  The MB(2) column is after I have added the 
extra field to the index.

Ext MB (1)  MB (2)
.cfe0.000.01
.cfs335.01  3612.09
.dii0.000.00
.dim324.38  319.07
.doc1094.68 2767.53
.dvd1211.84 625.44
.dvm0.140.08
.fdt1633.21 5387.92
.fdx2.121.44
.fnm0.110.12
.loc0.000.00
.nvd127.84  110.67
.nvm0.010.01
.pos809.23  1272.70
.si 0.020.03
.tim137.94  156.82
.tip2.523.04
Total   5679.06 14256.98


David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

-Original Message-
From: Howe, David [mailto:david.h...@auspost.com.au]
Sent: Wednesday, 14 February 2018 12:49 PM
To: solr-user@lucene.apache.org
Subject: RE: Index size increases disproportionately to size of added field 
when indexed=false


I have set docValues=false on all of the string fields in our index that have 
indexed=false and stored=true.  This gave a small improvement in the index size 
from 13.3GB to 12.82GB.

I have also tried running an optimize, which then reduced the index to 12.6GB.

Next step is to dump the sizes of the Solr index files for the index version 
that is the correct size and the version that has the large size.

Regards,

David


David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

-Original Message-
From: Howe, David [mailto:david.h...@auspost.com.au]
Sent: Wednesday, 14 February 2018 7:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Index size increases disproportionately to size of added field 
when indexed=false


Thanks Hoss.  I will try setting docValues to false, as we only ever want to be 
able to retrieve the value of this field.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.
Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.
Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
You are right, in my case this field type was applied to many text fields.
These includes many copy fields and dynamic fields as well. In my case,
only specifying omitNorms=true for field type "text_general" fixed the
issue. I didn't do anything else or had any other bug.

On Wed, Feb 14, 2018 at 1:01 PM, Alessandro Benedetti 
wrote:

> Hi pratik,
> how is it possible that just the norms for a single field were causing such
> a massive index size increment in your case ?
>
> In your case I think it was for a field type used by multiple fields, but
> it's still suspicious in my opinions,
> norms should be that big.
> If I remember correctly in old versions of Solr before the drop of index
> time boost, norms were containing both an approximation of the length of
> the
> field + index time boost.
> From your mailing list problem you moved from 10 Gb to 300 Gb.
> It can't be just the norms, are you sure you didn't face some bug ?
>
> Regards
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Alessandro Benedetti
Hi pratik,
how is it possible that just the norms for a single field were causing such
a massive index size increment in your case ?

In your case I think it was for a field type used by multiple fields, but
it's still suspicious in my opinions,
norms should be that big.
If I remember correctly in old versions of Solr before the drop of index
time boost, norms were containing both an approximation of the length of the
field + index time boost.
>From your mailing list problem you moved from 10 Gb to 300 Gb.
It can't be just the norms, are you sure you didn't face some bug ?

Regards



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Erick Erickson
Pratik may have jumped right to the difference. We'd have gotten there
eventually by looking at file extensions, but just checking his
recommendation would be the first thing to do!

bq:  what would be the right scenarios to use docvalues='true'?

Whenever you want to facet, group or sort on the field. This _will_
increase the index size on disk, but it's almost always a good
tradeoff, here's why:

To facet, group or sort you need to "uninvert" the field. If you have
docValues=false, this universion is done at run-time into Java's heap.
If you have docValues=true, the uninversion is done at _index_ time
and the result stored on disk. Now when it's required, it can be
loaded in from disk efficiently (essentially de-serialized) and is
stored on the OS memory due to the magic of MMapDirectory, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

bq:  In what situation would it make sense to have indexed=false and
docValues=true?

When you want to return _only_ fields that have docValues=true. If you
return fields with stored=true and docValues=false, Solr/Lucene has to
1> read the stored values from disk (minimum 16K block)
2> decrypt it
3> extract the field

With docValues, since they're only simple field types, all that you
have to do is read the value from the docValues structure., much more
efficient. HOWEVER, there are two caveats
1> The entire docValues field will be MMapped, so there's a time/space tradeoff.
2> docValues are stored in a sorted_set. This is relevant for
multiValued field because:
2a> values are returned in sorted order, not the order they were in the document
2b> identical values are collapsed.

So if the input values for a particular doc were 4, 3, 6, 4, 5, 2, 6,
5, 6, 5, 4, 3, 2 you'd get back 2, 3, 4, 5, 6

If you an live with those caveats, then returning field values would
involve much less work (both I/O and CPU), especially in
high-throughput situations. NOTE: there are a couple of JIRAs IIRC
that have to do with not storing the  though.

Best,
Erick

On Wed, Feb 14, 2018 at 7:01 AM, Pratik Patel <pra...@semandex.net> wrote:
> I had a similar issue with index size after upgrading to version 6.4.1 from
> 5.x. The issue for me was that the field which caused index size to be
> increased disproportionately had a field type("text_general") for which
> default value of omitNorms was not true. Turning it on explicitly on field
> fixed the problem. Following is the link to my related question.  You can
> verify value of omitNorms for your fields to check whether this is
> applicable in your case or not.
> http://search-lucene.com/m/Solr/eHNlagIB7209f1w1?subj=Fwd+Solr+dynamic+field+blowing+up+the+index+size
>
> On Tue, Feb 13, 2018 at 8:48 PM, Howe, David <david.h...@auspost.com.au>
> wrote:
>
>>
>> I have set docValues=false on all of the string fields in our index that
>> have indexed=false and stored=true.  This gave a small improvement in the
>> index size from 13.3GB to 12.82GB.
>>
>> I have also tried running an optimize, which then reduced the index to
>> 12.6GB.
>>
>> Next step is to dump the sizes of the Solr index files for the index
>> version that is the correct size and the version that has the large size.
>>
>> Regards,
>>
>> David
>>
>>
>> David Howe
>> Java Domain Architect
>> Postal Systems
>> Level 16, 111 Bourke Street Melbourne VIC 3000
>>
>> T  0391067904
>>
>> M  0424036591
>>
>> E  david.h...@auspost.com.au
>>
>> W  auspost.com.au
>> W  startrack.com.au
>>
>> -Original Message-
>> From: Howe, David [mailto:david.h...@auspost.com.au]
>> Sent: Wednesday, 14 February 2018 7:26 AM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Index size increases disproportionately to size of added
>> field when indexed=false
>>
>>
>> Thanks Hoss.  I will try setting docValues to false, as we only ever want
>> to be able to retrieve the value of this field.
>>
>> Regards,
>>
>> David
>>
>> David Howe
>> Java Domain Architect
>> Postal Systems
>> Level 16, 111 Bourke Street Melbourne VIC 3000
>>
>> T  0391067904
>>
>> M  0424036591
>>
>> E  david.h...@auspost.com.au
>>
>> W  auspost.com.au
>> W  startrack.com.au
>>
>> Australia Post is committed to providing our customers with excellent
>> service. If we can assist you in any way please telephone 13 13 18 or visit
>> our website.
>>
>> The information contained in this email communication may be proprietary,
>> confidential or legally professionally privileged. It is intended
>> exclusively for the individual 

Re: Index size increases disproportionately to size of added field when indexed=false

2018-02-14 Thread Pratik Patel
I had a similar issue with index size after upgrading to version 6.4.1 from
5.x. The issue for me was that the field which caused index size to be
increased disproportionately had a field type("text_general") for which
default value of omitNorms was not true. Turning it on explicitly on field
fixed the problem. Following is the link to my related question.  You can
verify value of omitNorms for your fields to check whether this is
applicable in your case or not.
http://search-lucene.com/m/Solr/eHNlagIB7209f1w1?subj=Fwd+Solr+dynamic+field+blowing+up+the+index+size

On Tue, Feb 13, 2018 at 8:48 PM, Howe, David <david.h...@auspost.com.au>
wrote:

>
> I have set docValues=false on all of the string fields in our index that
> have indexed=false and stored=true.  This gave a small improvement in the
> index size from 13.3GB to 12.82GB.
>
> I have also tried running an optimize, which then reduced the index to
> 12.6GB.
>
> Next step is to dump the sizes of the Solr index files for the index
> version that is the correct size and the version that has the large size.
>
> Regards,
>
> David
>
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> -Original Message-
> From: Howe, David [mailto:david.h...@auspost.com.au]
> Sent: Wednesday, 14 February 2018 7:26 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Index size increases disproportionately to size of added
> field when indexed=false
>
>
> Thanks Hoss.  I will try setting docValues to false, as we only ever want
> to be able to retrieve the value of this field.
>
> Regards,
>
> David
>
> David Howe
> Java Domain Architect
> Postal Systems
> Level 16, 111 Bourke Street Melbourne VIC 3000
>
> T  0391067904
>
> M  0424036591
>
> E  david.h...@auspost.com.au
>
> W  auspost.com.au
> W  startrack.com.au
>
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or visit
> our website.
>
> The information contained in this email communication may be proprietary,
> confidential or legally professionally privileged. It is intended
> exclusively for the individual or entity to which it is addressed. You
> should only read, disclose, re-transmit, copy, distribute, act in reliance
> on or commercialise the information if you are authorised to do so.
> Australia Post does not represent, warrant or guarantee that the integrity
> of this email communication has been maintained nor that the communication
> is free of errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper copy
> of this message. Any views expressed in this email communication are taken
> to be those of the individual sender, except where the sender specifically
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.
> Australia Post is committed to providing our customers with excellent
> service. If we can assist you in any way please telephone 13 13 18 or visit
> our website.
>
> The information contained in this email communication may be proprietary,
> confidential or legally professionally privileged. It is intended
> exclusively for the individual or entity to which it is addressed. You
> should only read, disclose, re-transmit, copy, distribute, act in reliance
> on or commercialise the information if you are authorised to do so.
> Australia Post does not represent, warrant or guarantee that the integrity
> of this email communication has been maintained nor that the communication
> is free of errors, virus or interference.
>
> If you are not the addressee or intended recipient please notify us by
> replying direct to the sender and then destroy any electronic or paper copy
> of this message. Any views expressed in this email communication are taken
> to be those of the individual sender, except where the sender specifically
> attributes those views to Australia Post and is authorised to do so.
>
> Please consider the environment before printing this email.
>


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David

I have set docValues=false on all of the string fields in our index that have 
indexed=false and stored=true.  This gave a small improvement in the index size 
from 13.3GB to 12.82GB.

I have also tried running an optimize, which then reduced the index to 12.6GB.

Next step is to dump the sizes of the Solr index files for the index version 
that is the correct size and the version that has the large size.

Regards,

David


David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

-Original Message-
From: Howe, David [mailto:david.h...@auspost.com.au]
Sent: Wednesday, 14 February 2018 7:26 AM
To: solr-user@lucene.apache.org
Subject: RE: Index size increases disproportionately to size of added field 
when indexed=false


Thanks Hoss.  I will try setting docValues to false, as we only ever want to be 
able to retrieve the value of this field.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.
Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David

Thanks Hoss.  I will try setting docValues to false, as we only ever want to be 
able to retrieve the value of this field.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


RE: Index size increases disproportionately to size of added field when indexed=false

2018-02-13 Thread Howe, David

Hi Erick,

Thanks for responding.  You are correct that we don't have any deleted docs.  
When we want to re-index (once a fortnight), we build a brand new installation 
of Solr from scratch and re-import the new data into an empty index.

I will try setting docValues to false and see if that makes a difference.  It 
sounds like we shouldn't have it on anyway, as we only ever want to be able to 
retrieve this field.  In what situation would it make sense to have 
indexed=false and docValues=true?

I will re-index and get a sizing for all of the different file extensions both 
with and without the problematic field.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  david.h...@auspost.com.au

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. 
If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, 
confidential or legally professionally privileged. It is intended exclusively 
for the individual or entity to which it is addressed. You should only read, 
disclose, re-transmit, copy, distribute, act in reliance on or commercialise 
the information if you are authorised to do so. Australia Post does not 
represent, warrant or guarantee that the integrity of this email communication 
has been maintained nor that the communication is free of errors, virus or 
interference.

If you are not the addressee or intended recipient please notify us by replying 
direct to the sender and then destroy any electronic or paper copy of this 
message. Any views expressed in this email communication are taken to be those 
of the individual sender, except where the sender specifically attributes those 
views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.


  1   2   3   4   5   6   7   8   9   10   >