Re: Error when indexing against a specific dynamic field type

2018-05-01 Thread THADC
Erick, thanks for the response. I have a number of documents in our database where solr is throwing the same exception against *_tsing types. However, when I index against the same document with our solr 4.7, it is successfully indexed. So, I assume something is different between 4.7 and 7.3. I

Re: Error when indexing against a specific dynamic field type

2018-05-01 Thread Shawn Heisey
On 5/1/2018 8:40 AM, THADC wrote: > I get the following exception: > > *Exception writing document id FULL_36265 to the index; possible analysis > error: Document contains at least one immense term in > field="gridFacts_tsing" (whose UTF8 encoding is longer than the max length > 32766), all of

Re: Error when indexing against a specific dynamic field type

2018-05-01 Thread Steve Rowe
- perhaps the analyzer used by this dynamic field should change? Alternatively, you could: a) truncate long values so that a prefix makes it through the indexing process, e.g. by adding TruncateTokenFilterFactory[1] to alphaOnlySort’s analyzer, or by adding TruncateFieldUpdateProcessorFactory[2

Re: Error when indexing against a specific dynamic field type

2018-05-01 Thread Erick Erickson
You're sending it a huge term. My guess is you're sending something like base64-encoded data or perhaps just a single unbroken string in your field. Examine your document, it should jump out at you. Best, Erick On Tue, May 1, 2018 at 7:40 AM, THADC wrote: >

Error when indexing against a specific dynamic field type

2018-05-01 Thread THADC
Hello, We are migrating from solr 4.7 to 7.3. When I encounter a data item that matches a custom dynamic field from our 4.7 schema: ** , I get the following exception: *Exception writing document id FULL_36265 to the index; possible analysis error: Document contains at least one immense term

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-25 Thread Denis Demichev
https://wiki.apache.org/solr/SolrPerformanceProblems#RAM > > > When we were doing the initial indexing, the indexing processes would get > > to a point where the updates were taking minutes to complete and the > cause > > was throttled write ops. > > Indexing speed

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-25 Thread Emir Arnautović
t; some of them are RUNNING > - updateExecutor-N-thread-M threads are in parked mode and number of docs > that I am able to submit is still low > - Tried to change maxIndexingThreads, set it to something high. This seems to > prolong the time when cluster is accepting new indexing re

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-24 Thread Shawn Heisey
the initial indexing, the indexing processes would get to a point where the updates were taking minutes to complete and the cause was throttled write ops. Indexing speed is indeed affected by disk speed, and adding memory can't fix that particular problem.  Using a storage controller with a large

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-24 Thread Mikhail Khludnev
running. Some of them are > blocked, some of them are RUNNING > - updateExecutor-N-thread-M threads are in parked mode and number of docs > that I am able to submit is still low > - Tried to change maxIndexingThreads, set it to something high. This seems > to prolong the time when cluster is ac

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-24 Thread Chris Ulicny
like our when we started indexing in the cloud instances. There might be an equivalent metric for AWS, but Google had the number of throttled reads and writes available (albeit through StackDriver) that we could track. When we were doing the initial indexing, the indexing processes would get

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-23 Thread Denis Demichev
-M threads are in parked mode and number of docs that I am able to submit is still low - Tried to change maxIndexingThreads, set it to something high. This seems to prolong the time when cluster is accepting new indexing requests and keeps CPU utilization a lot higher while the cluster is merging indexes Co

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-19 Thread Erick Erickson
When all indexing threads are occupied merging, incoming updates block until at least one thread frees up IIUC. The fact that you're not opening searchers doesn't matter as far as merging is concerned, that happens regardless on hard commits. Bumping your ram buffer up to 2G is usually

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-19 Thread Denis Demichev
t; maxBufferedDocs>50 > > Update handler: > > Could you please help me understand how can I validate this theory? > > Another note here. Even if I remove the stress from the cluster I still > > see that merging thread is consuming CPU for some time. It may take hours > &g

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-19 Thread Mikhail Khludnev
> and if I try to return the stress back nothing changes. > If this is overloaded merging process, it should take some time to reduce > the queue length and it should start accepting new indexing requests. > Maybe I am wrong, but I need some help to understand how to check it. > >

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-19 Thread Denis Demichev
time. It may take hours and if I try to return the stress back nothing changes. If this is overloaded merging process, it should take some time to reduce the queue length and it should start accepting new indexing requests. Maybe I am wrong, but I need some help to understand how to check it. AWS

Re: SolrCloud cluster does not accept new documents for indexing

2018-04-19 Thread Erick Erickson
quot; OpenJDK Runtime Environment (build > 1.8.0_161-b14) OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode) > Zookeeper - 3 standalone nodes on t2.large running under Exhibitor > > Symptoms: > 1. 4 instances running 4 threads each are using SolrJ client to submit > docume

Re: SolrCloud indexing

2018-04-16 Thread Erick Erickson
hrase searches? We changed our mind". "We decided to support 14 new use cases." ;)... On Sun, Apr 15, 2018 at 10:32 PM, Moshe Recanati | KMS <mos...@kmslh.com> wrote: > Hi Shawn, > Thank you. > I just need to run full indexing due to massive changes in the docu

RE: SolrCloud indexing

2018-04-15 Thread Moshe Recanati | KMS
Hi Shawn, Thank you. I just need to run full indexing due to massive changes in the document. Regards, Moshe Recanati CTO Mobile  + 972-52-6194481 Skype    :  recanati More at:  www.kmslh.com | LinkedIn | FB -Original Message- From: Shawn Heisey <apa...@elyograg.org> Sent:

RE: SolrCloud indexing

2018-04-15 Thread Moshe Recanati | KMS
r-user <solr-user@lucene.apache.org> Subject: Re: SolrCloud indexing I think you're saying you want to prove out the upgrade in some kind of test setup then switch live traffic. What's commonly used for that is collection aliasing. You just create a new collection and populate it and check i

Re: SolrCloud indexing

2018-04-15 Thread Erick Erickson
KMS wrote: >> >> >> We’re using SolrCloud as part of our product solution for High >> Availability. >> >> During upgrade of a version we need to run full index build on our Solr >> data. >> > > What are you upgrading? If it's Solr, you should pause

Re: SolrCloud indexing

2018-04-15 Thread Shawn Heisey
On 4/15/2018 1:22 AM, Moshe Recanati | KMS wrote: We’re using SolrCloud as part of our product solution for High Availability. During upgrade of a version we need to run full index build on our Solr data. What are you upgrading?  If it's Solr, you should pause/stop indexing while you

SolrCloud indexing

2018-04-15 Thread Moshe Recanati | KMS
Hi, We're using SolrCloud as part of our product solution for High Availability. During upgrade of a version we need to run full index build on our Solr data. I would like to know if as part of SolrCloud we can manage it and make sure that items are available during the index so only once

Re: Indexing fails with partially done

2018-04-11 Thread Emir Arnautović
Hi Neo, My DIH knowledge is a bit rusty, but I think that in best case, depending on your queries you might be able to use delta update to “resume” indexing, but it is likely that you cannot do that. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticse

Re: Indexing fails with partially done

2018-04-11 Thread Shawn Heisey
On 4/11/2018 6:46 AM, neotorand wrote: > with Solrcloud What happens if indexing is partially completed and ensemble > goes down.What are the ways to Resume.In one of the scenario i am using 3 ZK > Node in ensemble.Lets say i am indexing 5 million data and i have partially > inde

Re: Indexing fails with partially done

2018-04-11 Thread neotorand
Thanks Emir with context to DIH do we have any Resume mechanism? Regards Neo -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Indexing fails with partially done

2018-04-11 Thread Emir Arnautović
. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 11 Apr 2018, at 14:46, neotorand <neotor...@gmail.com> wrote: > > with Solrcloud What happens if indexing is partially compl

Indexing fails with partially done

2018-04-11 Thread neotorand
with Solrcloud What happens if indexing is partially completed and ensemble goes down.What are the ways to Resume.In one of the scenario i am using 3 ZK Node in ensemble.Lets say i am indexing 5 million data and i have partially indexed the data and ZK ensemble goes down. What should

Re: Ignore Field from indexing

2018-04-11 Thread Emir Arnautović
Hi, You have two options when it comes to updating: 1. Send complete document with the same id that will replace existing document. 2. Use atomic updates to send changes, but not that fields need to be stored:

Ignore Field from indexing

2018-04-10 Thread swap
Hi I have document indexed. Email-Id is unique key in document. On updating I need to ignore few field if its already exists. Please let me know if something more required. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Three Indexing Questions

2018-03-29 Thread Shawn Heisey
On 3/29/2018 3:59 PM, Terry Steichen wrote: > First question: When indexing content in a directory, Solr's normal > behavior is to recursively index all the files found in that directory > and its subdirectories.  However, turns out that when the files are of > the form *.eml (email)

Re: Indexing multi level Nested JSON using curl

2018-03-29 Thread Zheng Lin Edwin Yeo
Hi, Do anyone knows if we can make any change to the the split=/|/orgs in the curl URL command to achieve the indexing of the multi-level Nested JSON? Regards, Edwin On 26 March 2018 at 17:30, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > Hi, > > I'm trying to index the

Re: Three Indexing Questions

2018-03-29 Thread Erik Hatcher
of it, but there’s no reason it couldn’t be evolved to handle those things. Erik > > I note this message that's displayed when I begin indexing: "Entering > auto mode. File endings considered are > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf

Three Indexing Questions

2018-03-29 Thread Terry Steichen
First question: When indexing content in a directory, Solr's normal behavior is to recursively index all the files found in that directory and its subdirectories.  However, turns out that when the files are of the form *.eml (email), solr won't do that.  I can use a wildcard to get it to index

Re: Help Needed - Indexing Related

2018-03-27 Thread Shawn Heisey
On 3/27/2018 6:08 AM, YELESWARAPU, VENKATA BHAN wrote: > Hope you are doing well. I have been struggling with indexing for a week now. > Yesterday I deleted all indexing files and tried re-indexing. It failed > saying unable to open a new searcher. Also that _0.si file is missing.

RE: Help Needed - Indexing Related

2018-03-27 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ** Limited Access Team, Fyi..Deleting the indexing job queue table resolved the issue and it generated the index files. Thank you, Dutt _ From: YELESWARAPU, VENKATA BHAN Sent: Tuesday, March 27, 2018 8:08 AM To: 'solr

RE: Help Needed - Indexing Related

2018-03-27 Thread YELESWARAPU, VENKATA BHAN
@lucene.apache.org Subject: Re: Help Needed - Indexing Related Since this is a scheduled job I think you can get rid of commits and optimize which are invoked from scheduled job. On Tue, Mar 27, 2018 at 6:13 PM, YELESWARAPU, VENKATA BHAN < vyeleswar...@statestreet.com> wrote: > Information Classific

Re: Help Needed - Indexing Related

2018-03-27 Thread Sujay Bawaskar
response Sujay. > Solr Version - 4.3.1 > Yes, we are using client api to generate index files. > I don't see those parameters configured outside or in the logs, but > indexing job is scheduled, which I think will take care of these. > We have the option to schedule it to run i

RE: Help Needed - Indexing Related

2018-03-27 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ll Limited Access Thanks for your response Sujay. Solr Version - 4.3.1 Yes, we are using client api to generate index files. I don't see those parameters configured outside or in the logs, but indexing job is scheduled, which I think will take care of these. We have

Re: Help Needed - Indexing Related

2018-03-27 Thread Sujay Bawaskar
is current TPS and expected TPS? On Tue, Mar 27, 2018 at 5:38 PM, YELESWARAPU, VENKATA BHAN < vyeleswar...@statestreet.com> wrote: > Information Classification: ** Limited Access > > Hi Solr Team, > > Hope you are doing well. I have been struggling with indexing for a week

Help Needed - Indexing Related

2018-03-27 Thread YELESWARAPU, VENKATA BHAN
Information Classification: ** Limited Access Hi Solr Team, Hope you are doing well. I have been struggling with indexing for a week now. Yesterday I deleted all indexing files and tried re-indexing. It failed saying unable to open a new searcher. Also that _0.si file is missing. Today I

Indexing multi level Nested JSON using curl

2018-03-26 Thread Zheng Lin Edwin Yeo
23"},{"name2_ss": "edwin","phone2_ss":"456"}] }, { "name1_s" : "Apple", "city_s" : "Cupertino", "zip_s" : 95014, "orgs":[{"name2_ss":"alan","phone2_s

Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Thanks for the input. I have got this to work by using cygwin. Regards, Edwin On 23 March 2018 at 07:04, Chris Hostetter wrote: > : > : Ah, there's the extra bit of context: > : > PS C:\curl> .\curl ' > : > : You're using Windows perhaps? If so, it's probably a

Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Yes, I'm running this on Windows, using Windows Powershell "curl" command. Will try out other tools like cygwin. Thanks you. Regards, Edwin On 23 March 2018 at 06:52, Yonik Seeley wrote: > Ah, there's the extra bit of context: > > PS C:\curl> .\curl ' > > You're using

Re: Error in indexing JSON with space in value

2018-03-22 Thread Chris Hostetter
: : Ah, there's the extra bit of context: : > PS C:\curl> .\curl ' : : You're using Windows perhaps? If so, it's probably a shell issue : getting all of the data to the "curl" command. Yep.. and you cna even see in the trace output that curl thinks the entire JSON payload you want to send is

Re: Error in indexing JSON with space in value

2018-03-22 Thread Yonik Seeley
Ah, there's the extra bit of context: > PS C:\curl> .\curl ' You're using Windows perhaps? If so, it's probably a shell issue getting all of the data to the "curl" command. Something like cygwin or WSL (Windows Subsystem for Linux) may make your life easier. -Yonik On Thu, Mar 22, 2018 at

Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Thanks for your reply. This is the curl command that I run, with the "--trace -" output. PS C:\curl> .\curl 'http://localhost:8983/solr/collection1/update/json/docs? split=/|/orgs ' -H 'Content-type:application/j son' -d ' {

Re: Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
Thanks for your reply. PS C:\curl> .\curl ' http://localhost:8983/edm/emails6/update/json/docs?split=/|/orgs' -H 'Content-type:application/j son' -d ' { "id":"1", "name_s": "Joe Smith", "phone_s": 876876687, "orgs": [ { "name1_s": "Microsoft", "city_s": "Seattle",

Re: Error in indexing JSON with space in value

2018-03-22 Thread Yonik Seeley
It looks like a curl globbing issue from the curl error message you included: "curl: (3) [globbing] bad range specification in column 39" You can try turning off curl globbing with the -g param. That may not be the only issue though, as the command shown shouldn't have triggered curl globbing.

Re: Error in indexing JSON with space in value

2018-03-22 Thread Shawn Heisey
"zip_s" : 98052}, > { > "name1_s" : "Apple", > "city_s" : "Cupertino", > "zip_s" : 95014} > ] > }' > > However, I get the following error during the indexing. > > { > "respo

Re: Error in indexing JSON with space in value

2018-03-22 Thread Chris Hostetter
t;org.apache.solr.common.SolrException"], "msg":"Raw data can be stored only if split=/", "code":400}} Are you *certain* it was a plain old space character, and that you didn't somehow get an EOF character or NUL byte in there some how? Can y

Error in indexing JSON with space in value

2018-03-22 Thread Zheng Lin Edwin Yeo
t;Joe Smith", "phone_s": 876876687, "orgs": [ { "name1_s" : "Microsoft", "city_s" : "Seattle", "zip_s" : 98052}, { "name1_s" : "Apple", "city_s" : "

Re: Indexing multi level Nested JSON

2018-03-22 Thread Zheng Lin Edwin Yeo
name2_ss":"edwin","phone2_ss":"456"}] }, { "name1_s" : "Apple", "city_s" : "Cupertino", "zip_s" : 95014, "orgs":[{"name2_ss":"alan","phone2_ss":&quo

Re: Error when indexing with SolrJ HTTP ERROR 405

2018-03-20 Thread Shawn Heisey
On 3/18/2018 9:46 PM, Khalid Moustapha Askia wrote: SolrClient client = new HttpSolrClient.Builder(" http://localhost:8983/solr/#/corename;).build(); When I remove the "#" It throws a NullPointerException URLs with # in them will ONLY work in a browser.  They will not work for SolrJ. 

Re: Indexing multi level Nested JSON

2018-03-20 Thread Zheng Lin Edwin Yeo
Hi Mikhail, Thanks for your reply. Meaning the only way to identify them is to add in the fields, like Eg: contentType during indexing? Regards, Edwin On 20 March 2018 at 16:34, Mikhail Khludnev <m...@apache.org> wrote: > Edwin, > You need to add necessary fields into child/gr

Re: Error when indexing with SolrJ HTTP ERROR 405

2018-03-20 Thread Zheng Lin Edwin Yeo
eed to send binary content instead of html. Atleast that is what the > > error shows. > > > > I also think the url is wrong. The correct url should have > > http://localhost:8983/solr/core/update > > > > > > Check first whether indexing is working on the same

Re: Indexing multi level Nested JSON

2018-03-20 Thread Mikhail Khludnev
uot;3", > > "comments_s":"SolrCloud supports it too!", > > "_version_":1595334082096529408}, > > { > > "name_s":"alan", > > "phone_s":"123", >

Re: Indexing multi level Nested JSON

2018-03-19 Thread Zheng Lin Edwin Yeo
334082096529408}, > { > "name_s":"alan", > "phone_s":"123", > "_version_":1595334082096529408}, > { > "name_s":"edwin", > "phone_s":"456", >

Re: Error when indexing with SolrJ HTTP ERROR 405

2018-03-19 Thread Vincenzo D'Amore
t is what the > error shows. > > I also think the url is wrong. The correct url should have > http://localhost:8983/solr/core/update > > > Check first whether indexing is working on the same data that you are > trying to or not using the browser based tools. Check the url for the same.

Re: Error when indexing with SolrJ HTTP ERROR 405

2018-03-19 Thread Shamik Sinha
You need to send binary content instead of html. Atleast that is what the error shows. I also think the url is wrong. The correct url should have http://localhost:8983/solr/core/update Check first whether indexing is working on the same data that you are trying to or not using the browser based

Error when indexing with SolrJ HTTP ERROR 405

2018-03-19 Thread Khalid Moustapha Askia
e "#" It throws a NullPointerException I have been struggling for a week with this indexing...

Indexing multi level Nested JSON

2018-03-18 Thread Zheng Lin Edwin Yeo
529408}, { "id":"3a", "comments_s":"SolrCloud supports it too 2!", "_version_":1595334082096529408}]}, { "id":"2", "title_s":"New Lucene and Solr release is out", "contenttype_s":"parentDocument", "signature":"", "_version_":1595334082099675136, "_childDocuments_":[ { "name_s":"alan", "phone_s":"123", "_version_":1595334082099675136}, { "name_s":"edwin", "phone_s":"456", "_version_":1595334082099675136}, { "id":"4", "comments_s":"Lots of new features", "_version_":1595334082099675136}]}, { "id":"5", "title_s":"Testing of Nested JSON", "contenttype_s":"parentDocument", "signature":"", "_version_":1595334082101772288, "_childDocuments_":[ { "name_s":"alan", "phone_s":"123", "_version_":1595334082101772288}, { "name_s":"edwin", "phone_s":"456", "_version_":1595334082101772288}, { "id":"6", "comments_s":"See if this is a child", "_version_":1595334082101772288}]}] }} Is Solr able to support the indexing of multi level Nested JSON? I have tested this on Solr 6.5.1. Regards, Edwin

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-12 Thread spoonerk
u for your patience. I said that the > above phenomenon was caused by the IO, cpu, memory, and network io. The > swap was turned off and the machine's memory was sufficient. When the speed > of indexing is declining, QTime is found to take 3 seconds to 4 seconds to > reload

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-12 Thread 苗海泉
Thanks Erick and Shawn , Thank you for your patience. I said that the above phenomenon was caused by the IO, cpu, memory, and network io. The swap was turned off and the machine's memory was sufficient. When the speed of indexing is declining, QTime is found to take 3 seconds to 4 seconds

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-10 Thread spoonerk
cribed many times. But I still get emails from the > > list. Can some admin please unsubscribe me? > > > > On Mar 9, 2018 9:52 PM, "苗海泉" <mseaspr...@gmail.com> wrote: > > > >> hello,We found a problem. In solr 6.0, the indexing speed of solr is &

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-10 Thread Erick Erickson
e manually unsubscribed many times. But I still get emails from the > list. Can some admin please unsubscribe me? > > On Mar 9, 2018 9:52 PM, "苗海泉" <mseaspr...@gmail.com> wrote: > >> hello,We found a problem. In solr 6.0, the indexing speed of solr is >&g

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-10 Thread spoonerk
I have manually unsubscribed many times. But I still get emails from the list. Can some admin please unsubscribe me? On Mar 9, 2018 9:52 PM, "苗海泉" <mseaspr...@gmail.com> wrote: > hello,We found a problem. In solr 6.0, the indexing speed of solr is > influenced by the numb

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-10 Thread Shawn Heisey
On 3/10/2018 9:44 AM, Erick Erickson wrote: There are quite a number of reasons you may be seeing this, all having to do with trying to put too much stuff in too little hardware. At any rate, there's no a-priori limit to the number of collections/replicas/whatever that Solr can deal with, the

Re: The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-10 Thread Erick Erickson
unreasonable to assume that you can keep adding more and more and more collections/replicas per JVM and not eventually hit the limits of your hardware. And you mention that indexing slows down, which leads me to assume that you're adding more and more docs to each of these collections

The Impact of the Number of Collections on Indexing Performance in Solr 6.0

2018-03-09 Thread 苗海泉
hello,We found a problem. In solr 6.0, the indexing speed of solr is influenced by the number of solr collections. The speed is normal before the limit is reached. If the limit is reached, the indexing speed will decrease by 50 times. In our environment, there are 49 solr nodes. If each

Re: Indexing nested json

2018-03-08 Thread Rick Leir
> >Thanks for your response, >Jams > >On 3/8/18, 1:26 PM, "Mikhail Khludnev" <m...@apache.org> wrote: > > Will >https://lucene.apache.org/solr/guide/7_1/transforming-and-indexing-custom-json.html >work >for you? > >On Thu, Ma

Re: Indexing nested json

2018-03-08 Thread kasinger, james
udnev" <m...@apache.org> wrote: Will https://lucene.apache.org/solr/guide/7_1/transforming-and-indexing-custom-json.html work for you? On Thu, Mar 8, 2018 at 8:17 PM, kasinger, james < james.kasin...@nordstrom.com> wrote: > Hi folks,

Re: Indexing nested json

2018-03-08 Thread Mikhail Khludnev
Will https://lucene.apache.org/solr/guide/7_1/transforming-and-indexing-custom-json.html work for you? On Thu, Mar 8, 2018 at 8:17 PM, kasinger, james < james.kasin...@nordstrom.com> wrote: > Hi folks, > Has anyone had success indexing nested json into solr? I know that so

Re: Indexing nested json

2018-03-08 Thread Shawn Heisey
On 3/8/2018 10:17 AM, kasinger, james wrote: > Has anyone had success indexing nested json into solr? I know that solr > prefers a flattened representation of the data, but I’m exploring potential > solutions or workarounds for achieving this. Thanks in advance. > > For instan

Indexing nested json

2018-03-08 Thread kasinger, james
Hi folks, Has anyone had success indexing nested json into solr? I know that solr prefers a flattened representation of the data, but I’m exploring potential solutions or workarounds for achieving this. Thanks in advance. For instance I’m indexing this “document” and expect it to be presented

Re: Indexing timeout issues with SolrCloud 7.1

2018-03-01 Thread Tom Peters
726.html>). I decided to try and rewrite our indexing code to use delete by ID as opposed to delete by query (we deployed it today) and it appears to have significantly improved the indexing performance and reliability of the replicas. > On Feb 26, 2018, at 12:08 AM, Erick Erickson &l

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-03-01 Thread 苗海泉
Thank you for your advice on gc tools, what do you suggest to me? 2018-02-28 23:57 GMT+08:00 Shawn Heisey : > On 2/28/2018 2:53 AM, 苗海泉 wrote: > >> Thanks for your detailed advice, the monitor product you are talking about >> is good, but our solr system is running on a

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Shawn Heisey
On 2/28/2018 2:53 AM, 苗海泉 wrote: Thanks for your detailed advice, the monitor product you are talking about is good, but our solr system is running on a private network and seems to be unusable at all, with no single downloadable application for analyzing specific gc logs. For analyzing GC

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Emir Arnautović
>>>>> Desired survivor size 109051904 bytes, new threshold 15 (max 15) >>>>> - age 1: 47719032 bytes, 47719032 total >>>>> , 0.0554183 secs] >>>>> [Parallel Time: 48.0 ms, GC Workers: 18] >>>>>[GC Worker Start (ms): Min:

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread 苗海泉
t; [Ext Root Scanning (ms): Min: 2.9, Avg: 5.7, Max: 47.4, Diff: 44.6, > >>> Sum: 103.0] > >>> [Update RS (ms): Min: 0.0, Avg: 14.3, Max: 16.2, Diff: 16.2, Sum: > >>> 257.6] > >>>[Processed Buffers: Min: 0, Avg: 17.4, Max:

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-28 Thread Emir Arnautović
(ms): Min: 0.0, Avg: 16.6, Max: 17.6, Diff: 17.6, Sum: >>> 299.1] >>>[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: >> 18] >>> [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: >>> 0.5] >>> [GC Worker Tot

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Sum: 857.6] > > [GC Worker End (ms): Min: 4668018376.7, Avg: 4668018376.8, Max: > > 4668018376.8, Diff: 0.0] > > [Code Root Fixup: 0.2 ms] > > [Code Root Purge: 0.0 ms] > > [Clear CT: 0.2 ms] > > [Other: 7.1 ms] > > [Choose CSet: 0.0 ms] >

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
4 ms] > [Eden: 1552.0M(1552.0M)->0.0B(1488.0M) Survivors: 80.0M->144.0M Heap: > 25.8G(32.0G)->24.4G(32.0G)] > Heap after GC invocations=1144024 (full 72): > garbage-first heap total 33554432K, used 25550050K [0x7f147800, > 0x7f1478808000, 0x7f1c7800) >

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
sematext.com>: > Ah, so there are ~560 shards per node and not all nodes are indexing at > the same time. Why is that? You can have better throughput if indexing on > all nodes. If happy with shard size, you can create new collection with 49 > shards every 2h and have everything th

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Ah, so there are ~560 shards per node and not all nodes are indexing at the same time. Why is that? You can have better throughput if indexing on all nodes. If happy with shard size, you can create new collection with 49 shards every 2h and have everything the same and index on all nodes. Back

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
ff, but with 25x2=50 shards and 49 nodes, > one node will need to handle double indexing load. > > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > >

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
and if it is append only system, old shards will keep caches until reloaded. Probably will not make much diff, but with 25x2=50 shards and 49 nodes, one node will need to handle double indexing load. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consul

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
In addition, we found that the rate was normal when the number of collections was kept below 936 and the speed was slower and slower at 984. Therefore, we could only temporarily delete the older collection, but now we need more Online collection, there has been no good way to confuse us for a long

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
Thank you for reply. One collection has 25 shard one replica, one solr node has about 5T on desk. GC is checked ,and modify as follow : SOLR_JAVA_MEM="-Xms32768m -Xmx32768m " GC_TUNE=" \ -XX:+UseG1GC \ -XX:+PerfDisableSharedMem \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \

Re: When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread Emir Arnautović
Hi, To get more complete picture, can you tell us how many shards/replicas do you have per collection? Also what is index size on disk? Did you check GC? BTW, using 32GB heap prevents you from using compressed oops, resulting in less memory available than 31GB. Thanks, Emir -- Monitoring - Log

When the number of collections exceeds one thousand, the construction of indexing speed drops sharply

2018-02-27 Thread 苗海泉
I encountered a more serious problem in the process of using solr. We use the solr version is 6.0, our daily amount of data is about 500 billion documents, create a collection every hour, the online collection of more than a thousand, 49 solr nodes. If the collection in less than 800, the speed is

Re: Challenges of Indexing Email

2018-02-26 Thread Erick Erickson
oblems. Best, Erick On Mon, Feb 26, 2018 at 7:58 AM, Terry Steichen <te...@net-frame.com> wrote: > Thanks Karthik. > > (1) I thought the fix would be in 7.2.1, but it is not. Any idea when > it will be available? > > (2) Is there any way to force Solr indexing to treat

Re: Challenges of Indexing Email

2018-02-26 Thread Terry Steichen
Thanks Karthik. (1) I thought the fix would be in 7.2.1, but it is not.  Any idea when it will be available? (2) Is there any way to force Solr indexing to treat an email message (or thread) as plain text? Terry On 02/26/2018 10:37 AM, Karthik Ramachandran wrote: > There is bug rep

Re: Challenges of Indexing Email

2018-02-26 Thread Karthik Ramachandran
olr 7.2.1 and trying to index (among other documents) > individual emails and collected email threats. Ideally, the indexing > would parse the email messages into their constituent fields. But, for > my purposes, an acceptable alternative is to merely index the messages a > unstructur

Re: Indexing timeout issues with SolrCloud 7.1

2018-02-26 Thread Shawn Heisey
it takes about 15s to > complete. In Solr 7.1, it's taking about 5m. If I remove the deleteByQuery, > the indexing times are nearly identical. SolrCloud did not exist in Solr version 3.4.  It was introduced in version 4.0.  Did you mean version 4.3?  If you really did mean 3.4, then a direct c

Challenges of Indexing Email

2018-02-26 Thread Terry Steichen
I am using Solr 7.2.1 and trying to index (among other documents) individual emails and collected email threats.  Ideally, the indexing would parse the email messages into their constituent fields.  But, for my purposes, an acceptable alternative is to merely index the messages a unstructured text

Re: Indexing timeout issues with SolrCloud 7.1

2018-02-25 Thread Erick Erickson
> this manner (deletes then index). In Solr 3.4, it takes about 15s to > complete. In Solr 7.1, it's taking about 5m. If I remove the deleteByQuery, > the indexing times are nearly identical. > > When run in normal production mode where we have lots of processes indexing > at o

Re: Indexing timeout issues with SolrCloud 7.1

2018-02-24 Thread Deepak Goel
it takes about 15s to complete. In Solr 7.1, it's taking about 5m. If I remove the deleteByQuery, the indexing times are nearly identical. When run in normal production mode where we have lots of processes indexing at once (~20 or so), it starts to cause lots of issues (which you see below).

RE: autosuggestion indexing in a solr cluster

2018-02-23 Thread Deepak Udapudi
org>; Anupama Pullela <apull...@delta.org>; Segar Soundiramourthy <ssoundiramour...@delta.org> Subject: autosuggestion indexing in a solr cluster Hi all, We are using a Solr cluster. We have Solr configuration for auto-suggestions as shown below. Specialty

autosuggestion indexing in a solr cluster

2018-02-23 Thread Deepak Udapudi
Hi all, We are using a Solr cluster. We have Solr configuration for auto-suggestions as shown below. Specialty specialty specialty AnalyzingInfixLookupFactory specialty_suggester_infix_dir DocumentDictionaryFactory

Re: Indexing timeout issues with SolrCloud 7.1

2018-02-23 Thread Tom Peters
a number of times where I indexed 1500 documents in this manner (deletes then index). In Solr 3.4, it takes about 15s to complete. In Solr 7.1, it's taking about 5m. If I remove the deleteByQuery, the indexing times are nearly identical. When run in normal production mode where we have lots

<    1   2   3   4   5   6   7   8   9   10   >