Re: Indexing in one collection affect index in another collection

2019-02-06 Thread Zheng Lin Edwin Yeo
Hi everyone, Does anyone has further updates on this issue? Thank you. Regards, Edwin On Wed, 30 Jan 2019 at 14:17, Zheng Lin Edwin Yeo wrote: > Hi everyone, > > We have tried to do the setup and indexing on the latest Solr 7.6.0 > > However, we faced exactly the same issue a

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi everyone, We have tried to do the setup and indexing on the latest Solr 7.6.0 However, we faced exactly the same issue as what we faced in Solr 7.5.0, in which the search for customers collection slowed down once we indexed policies collection. Regards, Edwin On Wed, 30 Jan 2019 at 01:19

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
the > indices in ram and/or change from windows to linux, id it is important that > all queries including the first are very fast. > > Have a nice day > Paul > > -Ursprüngliche Nachricht- > Von: Shawn Heisey > Gesendet: Dienstag, 29. Januar 2019 13:25 > An: solr

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi Shawn, No worries, and thanks for your clarification. We make these changes in order to use the Unifed Highlighter, with hl.offsetSource = POSTING, and add "light" term vectors. The settings comes from what is written in the Solr guide on highlighting, which says the following: *Postings*:

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey
On 1/29/2019 5:25 AM, Shawn Heisey wrote: Adding termVectors will make the index bigger.  Potentially much bigger. This will increase the overall RAM requirement of the server, especially if the server is handling software other than Solr.  Anything that makes the index bigger can affect

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey
On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote: My guess is after we change our searchFields_tcs schema which is: *From*: *To:* Adding termVectors will make the index bigger. Potentially much bigger. This will increase the overall RAM requirement of the server, especially if the

AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd
the first are very fast. Have a nice day Paul -Ursprüngliche Nachricht- Von: Shawn Heisey Gesendet: Dienstag, 29. Januar 2019 13:25 An: solr-user@lucene.apache.org Betreff: Re: Indexing in one collection affect index in another collection On 1/29/2019 5:06 AM, Zheng Lin Edwin Yeo wrote

AW: Indexing in one collection affect index in another collection

2019-01-29 Thread paul.dodd
. Januar 2019 13:31 An: 'solr-user@lucene.apache.org' Betreff: AW: Indexing in one collection affect index in another collection Hi If the reason for the difference in speed is that the index is being read from disk, I would expect that the first query would be slow, but subsequent queries

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for you reply. However, we did not delete our index when the screenshot was taken. All the indexes are still in Solr. My guess is after we change our searchFields_tcs schema which is: *From*: *To:* The above change was done in order to use the Solr recommended unified

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Shawn Heisey
On 1/26/2019 4:48 PM, Zheng Lin Edwin Yeo wrote: Thanks for your reply. Below are the replies to your email: 1) We have tried to set the heap size to be 8g previously when we faced the same issue, and changing to 7g does not help too. 2) We are using standard disk at the moment. 3) In the

Re: Indexing in one collection affect index in another collection

2019-01-29 Thread Zheng Lin Edwin Yeo
Hi Shawn / Jan, Do we have any further insights about this problem? The same problem still happens even after we make the changes and re-index all the data. Regards, Edwin On Sun, 27 Jan 2019 at 07:48, Zheng Lin Edwin Yeo wrote: > Hi Shawn, > > Thanks for your reply. Below are the replies to

Re: Indexing in one collection affect index in another collection

2019-01-26 Thread Zheng Lin Edwin Yeo
Hi Shawn, Thanks for your reply. Below are the replies to your email: 1) We have tried to set the heap size to be 8g previously when we faced the same issue, and changing to 7g does not help too. 2) We are using standard disk at the moment. 3) In the link is the screenshot of the process list

Re: Indexing in one collection affect index in another collection

2019-01-26 Thread Shawn Heisey
On 1/26/2019 9:40 AM, Zheng Lin Edwin Yeo wrote: We have tried to add -a "-XX:+AlwaysPreTouch" that starts Solr, but there is no noticeable difference in the performance. As for the screenshot, I have captured another one after we added -a "-XX:+AlwaysPreTouch", and it is sorted on the Working

Re: Indexing in one collection affect index in another collection

2019-01-26 Thread Zheng Lin Edwin Yeo
Hi Shawn, We have tried to add -a "-XX:+AlwaysPreTouch" that starts Solr, but there is no noticeable difference in the performance. As for the screenshot, I have captured another one after we added -a "-XX:+AlwaysPreTouch", and it is sorted on the Working Set column. Below is the link to the

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Shawn Heisey
On 1/25/2019 9:11 AM, Zheng Lin Edwin Yeo wrote: As requested, below is the link to the screenshot of the resource monitor of our system. https://drive.google.com/file/d/1_-Tqhk9YYp9w8injHU4ZPSvdFJOx8A5s/view?usp=sharing The wiki page says to sort on the Working Set column. Your screenshot

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo
7.5.0, and currently I am facing an issue of when I am > > indexing in collection2, the indexing affects the records in collection1. > > Although the records are still intact, it seems that the settings of the > > termVecotrs get wipe out, and the index size of collection1 redu

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo
Hi Shawn, As requested, below is the link to the screenshot of the resource monitor of our system. https://drive.google.com/file/d/1_-Tqhk9YYp9w8injHU4ZPSvdFJOx8A5s/view?usp=sharing Regards, Edwin On Fri, 25 Jan 2019 at 23:35, Shawn Heisey wrote: > On 1/25/2019 7:47 AM, Zheng Lin Edwin Yeo

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Shawn Heisey
On 1/25/2019 7:47 AM, Zheng Lin Edwin Yeo wrote: Below is the command that we used to start Solr: cd solr-7.5.0 bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z "localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true pause Can you gather the screenshot mentioned

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo
ot;:{ > >>>>>> "time":0.0}, > >>>>>> "expand":{ > >>>>>> "time":0.0}, > >>>>>> "terms":{ > >>>>>>

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Jan Høydahl
>> wrote: >>>>>> >>>>>>> Looks like highlighting takes most of the time on the first query >>>>>>> (680ms). You config seems to ask for a lot of highlighting here, >> like 100 >>>>>>> snippets of max

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo
>>>> Cominvent AS - www.cominvent.com > >>>>> > >>>>>> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo < > >>>>> edwinye...@gmail.com>: > >>>>>> > >>>>>> Thanks for your reply.

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Jan Høydahl
r reply. >>>>>> >>>>>> Below are what you have requested about our Solr setup, configurations >>>>>> files ,schema and results of debug queries: >>>>>> >>>>>> Looking forward to your advice and support

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Jörn Franke
> I am using Solr 7.5.0, and currently I am facing an issue of when I am > indexing in collection2, the indexing affects the records in collection1. > Although the records are still intact, it seems that the settings of the > termVecotrs get wipe out, and the index size of collecti

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo
gt; which can be download from the following link: >>>> > >>>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing >>>> > >>>> > 3. The debug queries from both collections >>>> &

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo
q=5.0\n), product of:\n >>> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) / >>> > (docFreq + 0.5)) from:\n 812.0 = docFreq\n 600.0 = >>> > docCount\n1.6324438 = tfNorm, computed as (freq * (k1 + 1)) / >>> > (freq + k1 * (1 - b

Re: Indexing in one collection affect index in another collection

2019-01-25 Thread Zheng Lin Edwin Yeo
t; > b\n 19.397041 = avgFieldLength\n 31.0 = fieldLength\n”,.. >> > >> >"QParser":"LuceneQParser", >> > >> >"timing":{ >> > >> > "time":681.0, >> > >> &

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
t;:0.0, > > > >"query":{ > > > > "time":0.0}, > > > >"facet":{ > > > > "time":0.0}, > > > >"facet_module":{ > > >

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Jan Høydahl
>"debug":{ > > "time":0.0}}, > > "process":{ > >"time":680.0, > >"query":{ > > "time":19.0}, > >"facet":{ > &g

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
ery":{ "time":19.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Jan Høydahl
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd: >>> id=13245417 >>> 2019-01-24 02:47:57.957 INFO (qtp2131952342-1330) [c:collectioin1 >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] >>> o.a.s.u.p.StatelessScriptUpdateProcessorF

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd: >> id=13245430 >> 2019-01-24 02:47:57.957 INFO (qtp2131952342-1330) [c:collectioin1 >> s:shard1 r:core_node4 x:collection1_shard1_replica_n2] >> o.a.s.u.p.StatelessScriptUpdateProcessorFacto

Re: Indexing in one collection affect index in another collection

2019-01-24 Thread Zheng Lin Edwin Yeo
1-24 02:47:57.957 INFO (qtp2131952342-1330) [c:collectioin1 > s:shard1 r:core_node4 x:collection1_shard1_replica_n2] > o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd: > id=13245435 > > There is no change to the segments info. but the slowdown in t

Re: Indexing in one collection affect index in another collection

2019-01-23 Thread Zheng Lin Edwin Yeo
) [c:collectioin1 s:shard1 r:core_node4 x:policies_shard1_replica_n2] o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd: id=13245435 There is no change to the segments info. but the slowdown in the first collection is very drastic. Before the indexing of collection2, the collection1 query

Re: Indexing in one collection affect index in another collection

2019-01-23 Thread Shawn Heisey
On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote: I am using Solr 7.5.0, and currently I am facing an issue of when I am indexing in collection2, the indexing affects the records in collection1. Although the records are still intact, it seems that the settings of the termVecotrs get wipe out

Indexing in one collection affect index in another collection

2019-01-23 Thread Zheng Lin Edwin Yeo
Hi, I am using Solr 7.5.0, and currently I am facing an issue of when I am indexing in collection2, the indexing affects the records in collection1. Although the records are still intact, it seems that the settings of the termVecotrs get wipe out, and the index size of collection1 reduced from

Solr indexing raises error while posting PDF

2019-01-23 Thread sonam mittal
I am using Solr-6.6.4 version and Ubuntu 16 version.I have created a collection in Solr using the configuration files of the Solr example *techproducts*. I am trying to post a PDF in Solr but it is raising some errors.I have also installed the apache tika through maven but still it is showing the

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-19 Thread Zheng Lin Edwin Yeo
se/TIKA-2814?focusedCommentId=16745263=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16745263 > > Please, though, for the sake of Solr, please run Tika outside of Solr > in production (e.g. SolrJ...see: > https://lucidworks.com/2012/02/14/indexing-with-solrj/) > > O

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-17 Thread Tim Allison
, for the sake of Solr, please run Tika outside of Solr in production (e.g. SolrJ...see: https://lucidworks.com/2012/02/14/indexing-with-solrj/) On Thu, Jan 17, 2019 at 2:15 AM Zheng Lin Edwin Yeo wrote: > > Based on the discussion in Tika and also on the Jira (TIKA-2814), it was > said that

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-16 Thread Zheng Lin Edwin Yeo
book antiqua, palatino, serif; Josh ", >> > >> > >> > As you can see, this is taken from the Content-Type: text/html. >> > However, the Content-Type: text/plain looks clean, and that is what we >> want >> > it to be indexed. >> > >>

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-14 Thread Zheng Lin Edwin Yeo
can we configure the Tika in Solr to change the priority to get the > > content from Content-Type: text/plain instead of Content-Type: > text/html? > > > > On Mon, 14 Jan 2019 at 11:18, Zheng Lin Edwin Yeo > > wrote: > > > > > Hi, > > > > > > I am using

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-14 Thread Terry Steichen
Using 6.6.0, I am able to index EML files just fine.  The trick is, when indexing files containing .eml, add "-filetypes eml" to the commandline (note the plural filetypes). Terry Steichen On 1/13/19 10:18 PM, Zheng Lin Edwin Yeo wrote: > Hi, > > I am using Solr 7

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-14 Thread Alexandre Rafalovitch
Type: text/html? > > On Mon, 14 Jan 2019 at 11:18, Zheng Lin Edwin Yeo > wrote: > > > Hi, > > > > I am using Solr 7.5.0 with Tika 1.18. > > > > Currently I am facing a situation during the indexing of EML files, > > whereby the content is being extrac

Re: Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-13 Thread Zheng Lin Edwin Yeo
t; > I am using Solr 7.5.0 with Tika 1.18. > > Currently I am facing a situation during the indexing of EML files, > whereby the content is being extracted from the Content-type=text/html > instead of Content-type=text/plain. > > The problem with Content-type=text/html is that it

Content from EML files indexing from text/html (which is not clean) instead of text/plain

2019-01-13 Thread Zheng Lin Edwin Yeo
Hi, I am using Solr 7.5.0 with Tika 1.18. Currently I am facing a situation during the indexing of EML files, whereby the content is being extracted from the Content-type=text/html instead of Content-type=text/plain. The problem with Content-type=text/html is that it contains alot of words like

Re: Improve indexing speed?

2019-01-01 Thread Shawn Heisey
On 1/1/2019 8:59 AM, John Milton wrote: My document contains 65 fields. All the fields needs to be indexed. But for the 100 documents takes 10 seconds for indexing. I am using Solr 7.5 (2 cloud instance), with 50 shards. The best way to achieve fast indexing in Solr is to index multiple items

Re: Improve indexing speed?

2019-01-01 Thread Hendrik Haddorp
How are you indexing the documents? Are you using SolrJ or the plain REST API? Are you sending the documents one by one or all in one request? The performance is far better if you send the 100 documents in one request. If you send them individual, are you doing any commits between them

Re: Improve indexing speed?

2019-01-01 Thread Erick Erickson
of the norm. Best, Erick On Tue, Jan 1, 2019 at 9:05 AM John Milton wrote: > > Hi to all, > > My document contains 65 fields. All the fields needs to be indexed. But for > the 100 documents takes 10 seconds for indexing. > I am using Solr 7.5 (2 cloud instance), with 50 shards. > It's

Improve indexing speed?

2019-01-01 Thread John Milton
Hi to all, My document contains 65 fields. All the fields needs to be indexed. But for the 100 documents takes 10 seconds for indexing. I am using Solr 7.5 (2 cloud instance), with 50 shards. It's running on Windows OS and it has 32 GB RAM. Java heap space 15 GB. How to improve indexing speed

Re: Facing issue while transforming and indexing custom JSON

2018-12-31 Thread Alexandre Rafalovitch
'https://lucene. > > apache.org/solr > > /guide/7_5/transforming-and-indexing-custom-json.html#setting-json-defaults' > > for transforming and indexing custom JSON and added the code in one of the > > core to upload a multilevel JSON. It is throwing the below err

Re: Facing issue while transforming and indexing custom JSON

2018-12-31 Thread Shubhangi Shinde
uide/7_5/transforming-and-indexing-custom-json.html#setting-json-defaults' > for transforming and indexing custom JSON and added the code in one of the > core to upload a multilevel JSON. It is throwing the below error. I spent > so much time to solve this error but no luck. The error is, > > {

Re: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread Walter Underwood
lr-index]Can I do a lot of analysis on one field at the time > of indexing? > > Right, no feature that does that for you. > > You should be able to code that with an update request processor script. > You can fetch an analyzer chain, run it, add the results to a field, then do >

RE: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread 유정인
I do a lot of analysis on one field at the time of indexing? Right, no feature that does that for you. You should be able to code that with an update request processor script. You can fetch an analyzer chain, run it, add the results to a field, then do that again. I have one that runs a chain

Re: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread Walter Underwood
Right, no feature that does that for you. You should be able to code that with an update request processor script. You can fetch an analyzer chain, run it, add the results to a field, then do that again. I have one that runs a chain with minhash then saves the hex values of the hashes to a

Re: [solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread Erick Erickson
In a word, "no". A field can have exactly one tokenizer, and there are no conditional filters. You can copyField to multiple individual fields and treat each one of those differently, i.e. copy from title to title1, title2 etc. where each one has a different analysis chain. Best, Erick On Thu,

[solr-index]Can I do a lot of analysis on one field at the time of indexing?

2018-12-13 Thread 유정인
Hello I have a question about index schemas. 1) Can I do various analysis on one field? For example, you can analyze the 'title' field with multiple tokenizers, and merge the analysis into a single field. 2) You can collect multiple fields in one field using 'copyField' function. However,

Re: Enquiry about scheduling for re-indexing

2018-11-29 Thread Shawn Heisey
On 11/29/2018 6:00 AM, Alexandre Rafalovitch wrote: Solr does not have a built-in scheduler for triggering indexing. Only for triggering commits and purging auto-expiring records. So, if you want to trigger DIH indexing, you need to use an external scheduling mechanism for that. What

Re: Enquiry about scheduling for re-indexing

2018-11-29 Thread Alexandre Rafalovitch
Solr does not have a built-in scheduler for triggering indexing. Only for triggering commits and purging auto-expiring records. So, if you want to trigger DIH indexing, you need to use an external scheduling mechanism for that. Regards, Alex. On Thu, 29 Nov 2018 at 01:03, Ma Man wrote

Enquiry about scheduling for re-indexing

2018-11-28 Thread Ma Man
To whom it might concern, Recently, I am studying if Apache Solr able to re-index (Full Import / Delta Import) periodically by configuration instead of triggering by URL ( e.g. http://localhost:8983/solr/{collection_name}/dataimport?command=full-import ) in scheduler tool. Version of the Solr

RE: indexing multiple levels of data

2018-11-16 Thread Martin Frank Hansen (MHQ)
Hi Jan, Thanks for your quick reply! I was fearing that you would suggest this  I have already moved much of the indexing application out of Solr which gives me the desired flexibility, but I am a bit concerned about the time consumption doing so. Right now I have about 20,000 xml documents

Re: indexing multiple levels of data

2018-11-16 Thread Jan Høydahl
Hi Martin, For a complex use case as this I would recommend you write a separate indexer application that crawls the files, looks up the correct metadata XMLs based on given business rules, and then constructs the full Solr document to send to Solr. Even parsing full-text from PDF etc I would

indexing multiple levels of data

2018-11-16 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to add meta data and files to Solr, but are experiencing some problems. Data is divided on three two, cases and files. For each case the meta-data is given in an xml document, while meta data for the files is given in another xml document, and the actual files are kept in yet

Re: ClassNotFound indexing crypted documents

2018-11-14 Thread Shawn Heisey
On 11/13/2018 11:51 AM, Luca Vergantini wrote: Maybe I skipped the correct steps to open an issue, but here https://issues.apache.org/jira/browse/SOLR-12985 you can find the details. I think that is at least a configuration issue for the install script, but maybe is most hard. I detected this

Re: Indexing vs Search node

2018-11-14 Thread Fernando Otero
Thanks everyone this gave me great arguments for migrating to Solr7 :D On Fri, Nov 9, 2018 at 7:50 PM Shawn Heisey wrote: > On 11/9/2018 1:58 PM, David Hastings wrote: > > I personally like standalone solr for this reason, i can tune the > indexing > > "master"

ClassNotFound indexing crypted documents

2018-11-13 Thread Luca Vergantini
Maybe I skipped the correct steps to open an issue, but here https://issues.apache.org/jira/browse/SOLR-12985 you can find the details. I think that is at least a configuration issue for the install script, but maybe is most hard. I detected this problem on various server, installed by the

Re: Indexing vs Search node

2018-11-09 Thread Shawn Heisey
On 11/9/2018 1:58 PM, David Hastings wrote: I personally like standalone solr for this reason, i can tune the indexing "master" for doing nothing but taking in documents and that way the slaves dont battle for resources in the process. SolrCloud can be set up pretty similar to this

Re: Indexing vs Search node

2018-11-09 Thread David Hastings
I personally like standalone solr for this reason, i can tune the indexing "master" for doing nothing but taking in documents and that way the slaves dont battle for resources in the process. On Fri, Nov 9, 2018 at 3:10 PM Erick Erickson wrote: > Fernando: > > I'd phrase it

Re: Indexing vs Search node

2018-11-09 Thread Erick Erickson
Fernando: I'd phrase it more strongly than Shawn. Prior to 7.0 all replicas both indexed and search (they were NRT replica), so there wasn't any choice but to index and search on every replica. It's one of those things that if you have very high throughput (indexing) situations, you _might_ want

Re: Indexing vs Search node

2018-11-09 Thread Shawn Heisey
On 11/9/2018 12:13 PM, Fernando Otero wrote: I read in several blog posts that it's never a good idea to index and search on the same node. I wonder how that can be achieved in Solr Cloud or if it happens automatically. I would disagree with that blanket assertion. Indexing does put

Indexing vs Search node

2018-11-09 Thread Fernando Otero
Hi guys, I read in several blog posts that it's never a good idea to index and search on the same node. I wonder how that can be achieved in Solr Cloud or if it happens automatically. -- Fernando Otero Sr Engineering Manager, Panamera Buenos Aires - Argentina Mobile: +54 911 67697108

RE: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Phil Scadden
mmit(solr, "prindex"); return true; -Original Message- From: Erick Erickson Sent: Wednesday, 31 October 2018 06:00 To: solr-user Subject: Re: Indexing PDF file in Apache SOLR via Apache TIKA All of the above work, but for robust production situations you'll wan

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread ☼ R Nair
Erick Erickson wrote: > All of the above work, but for robust production situations you'll > want to consider a SolrJ client, see: > https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog > combines indexing from a DB and using Tika, but those are independent. > > Best

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Erick Erickson
All of the above work, but for robust production situations you'll want to consider a SolrJ client, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/. That blog combines indexing from a DB and using Tika, but those are independent. Best, Erick On Tue, Oct 30, 2018 at 12:21 AM Kamuela

Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Kamuela Lau
mode ? or is it only > with CLI mode ? if yes only with CLI mode, can you explain it to me please > ? > 2. Is it possible to add a text result in "Query" tab ?. > > The Background i asking about this is, i want to indexing PDF in my local > system, then i just upload

Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread adiyaksa kevin
F File to SOLR via TIKA with GUI mode ? or is it only with CLI mode ? if yes only with CLI mode, can you explain it to me please ? 2. Is it possible to add a text result in "Query" tab ?. The Background i asking about this is, i want to indexing PDF in my local system, then i just upl

Re: Error while indexing Thai core with SolrCloud

2018-10-21 Thread Moshe Recanati | KMS
Thank you. Will check all options and let you know. From: Alexandre Rafalovitch Sent: Sunday, October 21, 2018 8:09:34 PM To: solr-user Subject: Re: Error while indexing Thai core with SolrCloud Ok, That may have been a bit too much :-) However, it was useful

Re: Error while indexing Thai core with SolrCloud

2018-10-21 Thread Alexandre Rafalovitch
Ok, That may have been a bit too much :-) However, it was useful. There seem to have several possible avenues: 1) You are using SolrJ and your SolrJ version is not the same as the version of the Solr server. There was a bunch of things that could trigger, especially in combination with Unicode

Re: Error while indexing Thai core with SolrCloud

2018-10-21 Thread Moshe Recanati | KMS
Hi, Thank you. Full stacktrace below "core_node_name":"172.19.218.201:8082_solr_core_th"}DEBUG - 2018-10-19 02:13:20.343; org.apache.zookeeper.ClientCnxn$SendThread; Reading reply sessionid:0x200b5a04a770005, packet:: clientPath:null serverPath:null finished:false header:: 356,1

Re: Error while indexing Thai core with SolrCloud

2018-10-21 Thread Alexandre Rafalovitch
gt; > From: Alexandre Rafalovitch > Sent: Sunday, October 21, 2018 5:18:24 PM > To: solr-user > Subject: Re: Error while indexing Thai core with SolrCloud > > I would check if the Byte-order mark is the cause: > https://urldefense.proofpoint.com/v2/

Re: Error while indexing Thai core with SolrCloud

2018-10-21 Thread Moshe Recanati | KMS
Hi Alexandre, Thank you. How this explain the issue exists only with SolrCloud and not standalone? Moshe From: Alexandre Rafalovitch Sent: Sunday, October 21, 2018 5:18:24 PM To: solr-user Subject: Re: Error while indexing Thai core with SolrCloud I would

Re: Error while indexing Thai core with SolrCloud

2018-10-21 Thread Alexandre Rafalovitch
at 09:55, Moshe Recanati | KMS wrote: > > Hi, > > We've specific exception that happening only on Thai core and only once we're > using SolrCloud. > > Same indexing activity is running successfully while running on EN core with > SolrCloud or with Thai core and standalone co

Error while indexing Thai core with SolrCloud

2018-10-21 Thread Moshe Recanati | KMS
Hi, We've specific exception that happening only on Thai core and only once we're using SolrCloud. Same indexing activity is running successfully while running on EN core with SolrCloud or with Thai core and standalone configuration. We're running on Linux with Solr 4.6

Re: Indexing documents from S3 bucket

2018-10-08 Thread ☼ R Nair
On Mon, Oct 8, 2018, 11:26 AM marotosg wrote: > Hi, > > At the moment I have a SolrCloud Cluster with a documents collection being > populated indexing documents coming from a DFS server. Linux boxes are > mounting that DFS server using samba. > > There is a request to

Indexing documents from S3 bucket

2018-10-08 Thread marotosg
Hi, At the moment I have a SolrCloud Cluster with a documents collection being populated indexing documents coming from a DFS server. Linux boxes are mounting that DFS server using samba. There is a request to move that DFS server to a AWS S3 bucket. Does anyone have previous experience about

Re: Making Solr Indexing Errors Visible

2018-09-30 Thread Jason Gerlowski
Good luck, Jason On Thu, Sep 27, 2018 at 9:58 AM Shawn Heisey wrote: > > On 9/26/2018 2:39 PM, Terry Steichen wrote: > > Let me try to clarify a bit - I'm just using bin/post to index the files > > in a directory. That indexing process produces a lengthy screen display > &g

Re: Making Solr Indexing Errors Visible

2018-09-27 Thread Shawn Heisey
On 9/26/2018 2:39 PM, Terry Steichen wrote: Let me try to clarify a bit - I'm just using bin/post to index the files in a directory.  That indexing process produces a lengthy screen display of files that were indexed.  (I realize this isn't production-quality, but I'm not ready for production

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Shawn Heisey
On 9/26/2018 2:39 PM, Terry Steichen wrote: To the best of my knowledge, I'm not using SolrJ at all.  Just Solr-out-of-the-box.  In this case, if I understand you below, it "should indicate an error status" I think you'd know if you were using SolrJ directly.  You'd have written th

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Terry Steichen
using bin/post command (we just found this out) No, I said that at the outset.  And repeated it. > 2) You are indexing a bunch of files (what format? all same or different?) I also said I was indexing a mixture of pdf and doc files > 3) You are indexing them into a Schema supposedly ready for

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Alexandre Rafalovitch
The challenge here is to figure out exactly what you are doing, because the original description could have been 10 different things. So: 1) You are using bin/post command (we just found this out) 2) You are indexing a bunch of files (what format? all same or different?) 3) You are indexing them

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Terry Steichen
Shawn, To the best of my knowledge, I'm not using SolrJ at all.  Just Solr-out-of-the-box.  In this case, if I understand you below, it "should indicate an error status"  But it doesn't. Let me try to clarify a bit - I'm just using bin/post to index the files in a directory.  Tha

Re: Making Solr Indexing Errors Visible

2018-09-26 Thread Shawn Heisey
On 9/26/2018 1:23 PM, Terry Steichen wrote: I'm pretty sure this was covered earlier.  But I can't find references to it.  The question is how to make indexing errors clear and obvious. If there's an indexing error and you're NOT using the concurrent client in SolrJ, the response that Solr

Making Solr Indexing Errors Visible

2018-09-26 Thread Terry Steichen
I'm pretty sure this was covered earlier.  But I can't find references to it.  The question is how to make indexing errors clear and obvious.  (I find that there are maybe 10% more files in a directory than end up in the index.  I presume they were indexing errors, but I have no idea which ones

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2018-09-14 Thread Vadim Ivanov
s0 ... s3. After 120 sec of indexing I receive IdleTimeout from shard leader of s5 s4 receives no data and seems do not open connection at all - so no Timeout occurs s0...s3 receives data and no Timeout occurs When I tweak IdleTimeout in /opt/solr-7.4.0/server/etc/jetty-http.xml It helps, Bu

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

2018-09-14 Thread Mikhail Khludnev
ind out more details. > Time out occurs when while long indexing some documents in the beginning is > going to one shard and then for a long time (more than 120 sec) no data at > all is going to that shard. > Connection to that core, opened in the beginning of indexing, goes to idle >

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2018-09-13 Thread Vadim Ivanov
Hi, I've put some more tests on the issue and managed to find out more details. Time out occurs when while long indexing some documents in the beginning is going to one shard and then for a long time (more than 120 sec) no data at all is going to that shard. Connection to that core, opened

Re: Docker and Solr Indexing

2018-09-12 Thread Shawn Heisey
On 9/12/2018 7:43 AM, Dominique Bejean wrote: Are you aware about issues in Java applications in Docker if java version is not 10 ? https://blog.docker.com/2018/04/improved-docker-container-integration-with-java-10/ Solr explicitly sets heap size when it starts, so Java is *NOT* determining

Re: Docker and Solr Indexing

2018-09-12 Thread Dominique Bejean
Hi, Are you aware about issues in Java applications in Docker if java version is not 10 ? https://blog.docker.com/2018/04/improved-docker-container-integration-with-java-10/ Regards. Dominique Le mer. 12 sept. 2018 à 05:42, Shawn Heisey a écrit : > On 9/11/2018 9:20 PM, solrnoobie wrote: >

Idle Timeout while DIH indexing and implicit sharding in 7.4

2018-09-12 Thread Вадим Иванов
Hello gurus,  I am using solrCloud with DIH for indexing my data. Testing 7.4.0 with implicitly sharded collection I have noticed that any indexing longer then 2 minutes always failing with many timeout records in log coming from all replicas in collection. Such as: x:Mycol_s_0_replica_t40

Re: Docker and Solr Indexing

2018-09-11 Thread Shawn Heisey
On 9/11/2018 9:20 PM, solrnoobie wrote: So what we did is we upgraded the instances to 16 gigs and we rarely encounter this now. So what we did was to increase the batch size to 500 instead of 50 and it worked for our test data. But when we tried 1000 batch size, the invalid content type error

Re: Docker and Solr Indexing

2018-09-11 Thread solrnoobie
Thank you all for the kind and timely reply. So what we did is we upgraded the instances to 16 gigs and we rarely encounter this now. So what we did was to increase the batch size to 500 instead of 50 and it worked for our test data. But when we tried 1000 batch size, the invalid content type

Re: Docker and Solr Indexing

2018-09-11 Thread Jan Høydahl
g the leader shard > will restart after around 2 or less minutes of index time (batch is 50 docs > per batch with 3 threads in our app thread pool). Because of the container > restart, indexing will fail because solrJ will throw an invalid content type > exception because of the quick co

<    1   2   3   4   5   6   7   8   9   10   >