Re: In-place re-indexing after DocValue schema change

2020-01-29 Thread moscovig
Tank you Emir. I tried this locally (changing schema, re-index all implace) and I wasn't able to sort on the doc value fields anymore (someone actually mentioned this before on that forum - https://lucene.472066.n3.nabble.com/DocValues-error-td4240116.html) with the next error "Error from server

Re: In-place re-indexing after DocValue schema change

2020-01-29 Thread Emir Arnautović
Hi, 1. No, it’s not valid. Solr will look at schema to see if it can use docValues or if it has to uninvert field and it assumes that all fields will have doc values. You might expect from wrong results to errors if you do something like that. 2. Not sure if it would work, but It is not better

In-place re-indexing after DocValue schema change

2020-01-29 Thread moscovig
Hi all We are about to alter our schema with some DocValue annotations. According to docs, we should whether delete all docs and re-insert, or create a new collection with the new schema. 1. Is it valid to modify the schema in the current collection, where all documents were created without

Indexing HTML Metatags Nutch - SOLR

2020-01-18 Thread kra...@gds2.de
nd various other document formats via HTTP/HTTPS and indexing the crawled content into Solr. More plugins are available to support more indexing backends, to fetch ftp:// and file:// URLs, for focused crawling, and many other use cases. http.robot.rules.whitelist sitlux02.sit.

Indexing HTML Metatags Nutch - SOLR

2020-01-18 Thread kra...@gds2.de
nd various other document formats via HTTP/HTTPS and indexing the crawled content into Solr. More plugins are available to support more indexing backends, to fetch ftp:// and file:// URLs, for focused crawling, and many other use cases. http.robot.rules.whitelist sitlux02.sit.

Re: need for re-indexing when using managed schema

2019-12-16 Thread Erick Erickson
field won’t return any docs indexed before the change until the older docs are re-indexed. So you can see where this is going. “If you add a field _and then reindex all your documents_, it’s perfectly safe. However, between the time you add the field and the re-indexing is complete, you results

need for re-indexing when using managed schema

2019-12-16 Thread Joseph Lorenzini
Hi all, I have question about the managed schema functionality. According to the docs, "All changes to a collection’s schema require reindexing". This would imply that if you use a managed schema and you use the schema API to update the schema, then doing a full re-index is necessary each time.

Re: Indexing with customized parameters

2019-12-12 Thread Anuj Bhargava
Emir Thanks, Perfect On Thu, 12 Dec 2019 at 13:40, Emir Arnautović wrote: > Hi Anuj, > Maybe I am missing something but this is more question for some SQL group > than for Solr group. I am surprised that you get any records. You can > consult your DB documentation for some more elegant

Re: Indexing with customized parameters

2019-12-12 Thread Emir Arnautović
Hi Anuj, Maybe I am missing something but this is more question for some SQL group than for Solr group. I am surprised that you get any records. You can consult your DB documentation for some more elegant solution, but a brute-force solution, if your column is string, could be: WHERE sector =

Re: Indexing with customized parameters

2019-12-11 Thread Anuj Bhargava
Any suggestions? Regards, Anuj On Tue, 10 Dec 2019 at 20:52, Anuj Bhargava wrote: > I am trying to index where the *sector field* has the values 27 and/or > 2701 and/or 2702 using the following - > >query="SELECT * FROM country WHERE sector = 27 OR sector = 2701 OR > sector = 2702" >

Re: Indexing strategies for user profiles

2019-12-10 Thread Dave
gt; just that, I would also like to get relevant product for a user based on > some sort of collaborative filtering. What should be my indexing indexing > and collection creation strategy to tackle this problem in general?

Indexing strategies for user profiles

2019-12-10 Thread Arnold Bronley
that, I would also like to get relevant product for a user based on some sort of collaborative filtering. What should be my indexing indexing and collection creation strategy to tackle this problem in general?

Indexing with customized parameters

2019-12-10 Thread Anuj Bhargava
I am trying to index where the *sector field* has the values 27 and/or 2701 and/or 2702 using the following - The sector field has comma separated multiple values like - 27,19,527 38,27,62701 2701,49 55,2702,327 The issue is when I run the above, it indexes the fields containing data

Re: Solr indexing performance

2019-12-05 Thread Shawn Heisey
r more likely that the indexing would fail. It's not likely that such problems would make indexing slow. Thanks, Shawn

Re: Solr indexing performance

2019-12-05 Thread Paras Lehana
olr Cloud setup where the client is indexing in 5 > > parallel threads with 5000 docs per batch. This is a test setup and all > > documents are indexed on the same node. We are seeing connection timeout > > issues thereafter some time into indexing. I am yet to analyze GC p

Re: Solr indexing performance

2019-12-05 Thread Shawn Heisey
On 12/5/2019 10:28 AM, Rahul Goswami wrote: We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5 parallel threads with 5000 docs per batch. This is a test setup and all documents are indexed on the same node. We are seeing connection timeout issues thereafter some time

Re: Solr indexing performance

2019-12-05 Thread Vincenzo D'Amore
Hi, the clients are reusing their SolrClient? Ciao, Vincenzo -- mobile: 3498513251 skype: free.dev > On 5 Dec 2019, at 18:28, Rahul Goswami wrote: > > Hello, > > We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5 > parallel threads with 5

Solr indexing performance

2019-12-05 Thread Rahul Goswami
Hello, We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5 parallel threads with 5000 docs per batch. This is a test setup and all documents are indexed on the same node. We are seeing connection timeout issues thereafter some time into indexing. I am yet to analyze GC pauses

Re: solr 8.3 indexing wrong values in some fields

2019-12-03 Thread Odysci
nly numbers, or only alpha chars, etc.). I ran this program immediately > > after the index updating and it did not detect any problems. > > > > Then I started regular use of the system, indexing new documents, and I > > noticed that some fields were getting the wrong values. For

Re: solr 8.3 indexing wrong values in some fields

2019-12-02 Thread Colvin Cowie
heck the values of all fields in the solr > docs, for consistency (e.g., fields which are supposed to have > only numbers, or only alpha chars, etc.). I ran this program immediately > after the index updating and it did not detect any problems. > > Then I started regular use of the

solr 8.3 indexing wrong values in some fields

2019-12-01 Thread Odysci
, for consistency (e.g., fields which are supposed to have only numbers, or only alpha chars, etc.). I ran this program immediately after the index updating and it did not detect any problems. Then I started regular use of the system, indexing new documents, and I noticed that some fields were getting the wrong

Re: Solr 8.2 indexing issues

2019-11-21 Thread Jörn Franke
You are switching 2 major versions. You probably need to delete the collections (fully not only delete command) and reindex > Am 12.11.2019 um 21:42 schrieb Sujatha Arun : > > We recently migrated from 6.6.2 to 8.2. We are seeing issues with indexing > where the leader and

Re: Solr 8.2 indexing issues

2019-11-21 Thread Rahul Goswami
ted from 6.6.2 to 8.2. We are seeing issues with > indexing > > where the leader and the replica document counts do not match. We get > > different results every time we do a *:* search. > > > > The only issue we see in the logs is Jira issue : Sol

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
you said "But now your > uuid fields will look like this, right?"? > > I finished indexing my 45 millions documents successfully by casting the > UUID in the SQL itself like this (that's for a postgres db): > SELECT myuuidfield::text, mypk FROM {{solr__datasource}} > > I'm

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
Thanks both for the advice. Erick, which message were you referring to when you said "But now your uuid fields will look like this, right?"? I finished indexing my 45 millions documents successfully by casting the UUID in the SQL itself like this (that's for a postgres db): SELECT m

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Erick Erickson
> make your SQL statement output this as some kind of string. 2> The aforementioned ScriptUpdateProcessor can transform this into "4ee3992e-0b2d-e811-89a7-0025900429ba” with your favorite scripting language 3> use a PatternReplaceCharFilter to transform this before it gets to the in

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
I'm currently re-indexing with the cast to string in the sql statement. It looks good so far. On Thu, 14 Nov 2019 at 14:13, Jörn Franke wrote: > You can use an updateScript handler to do this kind of postprocessing or > you can cast it in your sql Statement as string. > > > >

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Jörn Franke
>> >> >>> On Thu, 14 Nov 2019 at 13:38, Jörn Franke wrote: >>> It seems there is a prefix java.util.UUID: in front of your UUID. Any idea >>> where it comes from? Is it also like this in the database? Is your import >>> handler maybe re

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
gt;> On Thu, 14 Nov 2019 at 13:38, Jörn Franke wrote: >> >>> It seems there is a prefix java.util.UUID: in front of your UUID. Any >>> idea where it comes from? Is it also like this in the database? Is your >>> import handler maybe receiving a java object j

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
It seems there is a prefix java.util.UUID: in front of your UUID. Any >> idea where it comes from? Is it also like this in the database? Is your >> import handler maybe receiving a java object java.util.UUID and it is not >> converted correctly to string? >> >> > Am 14.

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
> > > I'm running into an issue with Solr 8.3.0: it fails at indexing a schema > with UUID field. > > > > I'm using a SolrCloud setup with 3 instances, and I'm using the DIH to > fetch and index the data from a postgres database. > > > > In

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Jörn Franke
let : > >  > > Hi, > > I'm running into an issue with Solr 8.3.0: it fails at indexing a schema with > UUID field. > > I'm using a SolrCloud setup with 3 instances, and I'm using the DIH to fetch > and index the data from a postgres database. > > In sc

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
e problem happens is made of 3 shards with 3 NRT > replicas each. > > > On Thu, 14 Nov 2019 at 11:52, Boris Chazalet > wrote: > >> >> Hi, >> >> I'm running into an issue with Solr 8.3.0: it fails at indexing a schema >> with UUID field. >> >>

Re: Solr 8.2 indexing issues

2019-11-14 Thread Paras Lehana
Hi Sujatha, Apologies that I am not addressing your bug directly but have you tried 8.3 <https://lucene.apache.org/solr/downloads.html> that has just been released? On Wed, 13 Nov 2019 at 02:12, Sujatha Arun wrote: > We recently migrated from 6.6.2 to 8.2. We are seeing issues with

Re: 8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
Few things I forgot to mention: - I'm running on java 8 - the collection where the problem happens is made of 3 shards with 3 NRT replicas each. On Thu, 14 Nov 2019 at 11:52, Boris Chazalet wrote: > > Hi, > > I'm running into an issue with Solr 8.3.0: it fails at indexing a schema

8.3.0: Invalid UUID String while indexing document with a UUID field

2019-11-14 Thread Boris Chazalet
Hi, I'm running into an issue with Solr 8.3.0: it fails at indexing a schema with UUID field. I'm using a SolrCloud setup with 3 instances, and I'm using the DIH to fetch and index the data from a postgres database. In schema.xml I have: The data-config is a simple select, the uuid

Solr 8.2 indexing issues

2019-11-12 Thread Sujatha Arun
We recently migrated from 6.6.2 to 8.2. We are seeing issues with indexing where the leader and the replica document counts do not match. We get different results every time we do a *:* search. The only issue we see in the logs is Jira issue : Solr-13293 Has anybody seen similar issues? Thanks

Re: Zipped folder indexing in Solr Cloud

2019-11-05 Thread Erick Erickson
If Jörn’s suggestion doesn’t work for you, consider running Tika outside of Solr, here’s some explanation of why you probably want to do that for anything other than prototyping, and some sample code: https://lucidworks.com/post/indexing-with-solrj/ Best, Erick > On Nov 5, 2019, at 7:03

Re: Zipped folder indexing in Solr Cloud

2019-11-05 Thread Jörn Franke
You can unzip it before. Or am I overlooking something ? > Am 05.11.2019 um 13:00 schrieb Biswarup Roy : > > Hello, > > I have a compressed folder (.zip) which contains the PDFs, TXTs, and XML > file. > I am trying to index that folder in Solr Cloud, but not being able to do > that. > I am

Zipped folder indexing in Solr Cloud

2019-11-05 Thread Biswarup Roy
Hello, I have a compressed folder (.zip) which contains the PDFs, TXTs, and XML file. I am trying to index that folder in Solr Cloud, but not being able to do that. I am using Solr 8.2. Can you please help me on how I can index that zipped folder in Solr Cloud? I am eagerly waiting for your

Re: NRT vs TLOG bulk indexing performances

2019-10-30 Thread Dominique Bejean
create the collection as NRT or as TLOG gives the same indexing time and the same CPU usage. My impression is that use TLOG replica produce 10% to 20% indexing time increase according to autoCommit maxtime setting. Regards Dominique Le ven. 25 oct. 2019 à 15:46, Erick Erickson a écrit : >

Re: NRT vs TLOG bulk indexing performances

2019-10-26 Thread Erick Erickson
"I understand that while non leader TLOG is copying the index from leader, the leader stop indexing” This _better_ not be happening. If you can demonstrate this let’s open a JIRA. > On Oct 25, 2019, at 8:28 AM, Dominique Bejean > wrote: > > I understand that while non leade

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Erick Erickson
I’m also surpised that you see a slowdown, it’s worth investigating. Let’s take the NRT case with only a leader. I’ve seen the NRT indexing time increase when even a single follower was added (30-40% in this case). We believed that the issue was the time the leader sat waiting around

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Ere Maijala
ant to avoid the indexing server handling queries. It can also be used to prefer local replicas to minimize network access. --Ere -- Ere Maijala Kansalliskirjasto / The National Library of Finland

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Shawn, So, I understand that while non leader TLOG is copying the index from leader, the leader stop indexing. One shot large heavy bulk indexing should be very much more impacted than continus ligth indexing. Regards. Dominique Le ven. 25 oct. 2019 à 13:54, Shawn Heisey a écrit : > On

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Shawn Heisey
On 10/25/2019 1:16 AM, Dominique Bejean wrote: For collection created with all replicas as NRT * Indexing time : 22 minutes For collection created with all replicas as TLOG * Indexing time : 34 minutes NRT indexes simultaneously on all replicas. So when indexing is done on one

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
est? > > > Am 25.10.2019 um 09:16 schrieb Dominique Bejean < > dominique.bej...@eolya.fr>: > > > > Hi, > > > > I made some benchmarks for bulk indexing in order to compare performances > > and ressources usage for NRT versus TLOG replica. > > >

Re: NRT vs TLOG bulk indexing performances

2019-10-25 Thread Jörn Franke
Which Solr version are you using and how often you repeated the test? > Am 25.10.2019 um 09:16 schrieb Dominique Bejean : > > Hi, > > I made some benchmarks for bulk indexing in order to compare performances > and ressources usage for NRT versus TLOG replica. > > Env

NRT vs TLOG bulk indexing performances

2019-10-25 Thread Dominique Bejean
Hi, I made some benchmarks for bulk indexing in order to compare performances and ressources usage for NRT versus TLOG replica. Environnent : * Solrcloud with 4 Solr nodes (8 Gb RAM, 4 Gb Heap) * 1 collection with 2 shards x 2 replicas (all NRT or all TLOG) * 1 core per Solr Server Indexing

Re: Migration: SOLR8-Java8 -> SOLR8-JAVA11 indexing issue.

2019-10-24 Thread anup.junagade
Thanks Shawn for checking. As advised we will execute the indexing with the new settings as mentioned and will update the results. Here are the links to missing attachments: Attachment 1: OpenJDK 11 vs OpenJDK 8 key metrics <https://drive.google.com/open?id=1YRlP-vBrxAJ7NpyY_DkjPemU6gigs

Re: Migration: SOLR8-Java8 -> SOLR8-JAVA11 indexing issue.

2019-10-24 Thread Shawn Heisey
On 10/24/2019 11:50 AM, Junagade, Anup wrote: * Attachment 1: OpenJDK 8 vs OpenJDK 8 key metrics * Attachment 2: OpenJDK 8 vs OpenJDK 8 waiting QTP Threads * Attachment 3: OpenJDK 11 Thread dump There are no attachments. Apache mailing lists swallow almost all attachments.

Solr 8.1.1 Indexing issue while migrating Java8 -> Java11

2019-10-24 Thread anup.junagade
We are trying to migrate our SOLR 8.1.1 cluster from OpenJDK Java 8 to OpenJDK Java 11 and are facing issues with Indexing. While our indexing is happening flawlessly on Java 8, it crawls or maybe I should say it stalls with Java 11. Any pointers/help is appreciated. *Symptoms* With OpenJDK

Re: Migration: SOLR8-Java8 -> SOLR8-JAVA11 indexing issue.

2019-10-24 Thread Junagade, Anup
We are trying to migrate our SOLR 8.1.1 cluster from OpenJDK Java 8 to OpenJDK Java 11 and are facing issues with Indexing. While our indexing is happening flawlessly on Java 8, it crawls or maybe I should say it stalls with Java 11. Any pointers/help is appreciated. Symptoms

Lemmatizer for indexing

2019-10-14 Thread Shamik Bandopadhyay
Hi, I'm trying to use a lemmatized in my analysis chain. Just wondering what is the recommended way of achieving this. I've come across few different implementation which are listed below; Open NLP --> https://lucene.apache.org/solr/guide/7_5/language-analysis.html#opennlp-lemmatizer-filter

Re: how to improve indexing using autocommit

2019-10-03 Thread Erick Erickson
and collection2 respectively. Your customer-facing app uses “query”, and your indexing app uses “index”. Now you do your full import to the “index” collection, which is aliased to collection2. When it’s done you point your “query” alias to collection2 and your “index” alias to collection1

how to improve indexing using autocommit

2019-10-03 Thread babloorawat
Hi, We are performing solr indexing on a daily basis full import: once in a day delta import: after every 3 hours. We have around 4 docs for indexing. Time taken to do full import indexing is around 1 hour 45 minutes and we need to optimize it. I am wondering if anyone helps me figure out

Re: Atomic indexing as default indexing mode in Solr

2019-09-05 Thread Erick Erickson
> >> Because atomic updates require special preparation, specifically all >> original fields must be stored which is not a requirement and is, in fact, >> an anti-pattern in large installations. >> >> Best, >> Erick >> >>> On Sep 4, 2019, at

Re: Atomic indexing as default indexing mode in Solr

2019-09-05 Thread Shankar Ramalingam
> > Best, > Erick > > > On Sep 4, 2019, at 7:51 PM, Arnold Bronley > wrote: > > > > Why atomic indexing is not the default mode of indexing in Solr? That way > > the ownership model of the content changes from document level to field > > level for clients

Re: Atomic indexing as default indexing mode in Solr

2019-09-04 Thread Erick Erickson
Because atomic updates require special preparation, specifically all original fields must be stored which is not a requirement and is, in fact, an anti-pattern in large installations. Best, Erick > On Sep 4, 2019, at 7:51 PM, Arnold Bronley wrote: > > Why atomic indexing is not th

Atomic indexing as default indexing mode in Solr

2019-09-04 Thread Arnold Bronley
Why atomic indexing is not the default mode of indexing in Solr? That way the ownership model of the content changes from document level to field level for clients. Multiple clients can participate in the contribution process of the same Solr document without overwriting each other.

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-03 Thread Mikhail Khludnev
age- > > From: Mikhail Khludnev [mailto:m...@apache.org] > > Sent: Monday, September 02, 2019 12:23 PM > > To: Vadim Ivanov; solr-user > > Subject: Re: Idle Timeout while DIH indexing and implicit sharding in 7.4 > > > > It seems like reasonable behavior. SolrWrite

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-02 Thread Vadim Ivanov
records from DIH. Am I wrong? -- Vadim > -Original Message- > From: Mikhail Khludnev [mailto:m...@apache.org] > Sent: Monday, September 02, 2019 12:23 PM > To: Vadim Ivanov; solr-user > Subject: Re: Idle Timeout while DIH indexing and implicit sharding in 7.4 > > It

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-02 Thread Mikhail Khludnev
t; *Sent:* Monday, September 02, 2019 1:31 AM > *To:* solr-user > *Cc:* vadim.iva...@spb.ntk-intourist.ru > *Subject:* Re: Idle Timeout while DIH indexing and implicit sharding in > 7.4 > > > > Giving that > > org.apache.solr.common.util.FastInputStream.peek(FastInputStream

Re: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-01 Thread Mikhail Khludnev
ng same exact issue. We never had any issue with 6.5.1 when doing > full index (initial bulk load) > After upgrading to 7.5.0, getting below exception and indexing is taking a > very long time > > 2019-09-01 10:11:27.436 ERROR (qtp1650813924-22) [c:c_collection s:shard1 > r:core_n

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-01 Thread swapna.minnaka...@copart.com
I am facing same exact issue. We never had any issue with 6.5.1 when doing full index (initial bulk load) After upgrading to 7.5.0, getting below exception and indexing is taking a very long time 2019-09-01 10:11:27.436 ERROR (qtp1650813924-22) [c:c_collection s:shard1 r:core_node3

RE: Idle Timeout while DIH indexing and implicit sharding in 7.4

2019-09-01 Thread swapna.minnaka...@copart.com
I am facing same exact issue. We never had any issue with 6.5.1 when doing full index (initial bulk load) After upgrading to 7.5.0, getting below exception and indexing is taking a very long time 2019-09-01 10:11:27.436 ERROR (qtp1650813924-22) [c:c_member_lots_a s:shard1 r:core_node3

Re: Query-time synonyms without indexing

2019-08-29 Thread Erick Erickson
s.apache.org/jira/browse/LUCENE-8134, I needed to add > a omitTermFreqAndPositions="true" to the declaration. > This has to do with defaults for a string field being different from a text > field, and i Solr 8+ indexing fails because of above ticket. > Adding omitTermFreqAnd

Re: Query-time synonyms without indexing

2019-08-29 Thread Bjarke Buur Mortensen
h defaults for a string field being different from a text field, and i Solr 8+ indexing fails because of above ticket. Adding omitTermFreqAndPositions="true" ensures that index field type and the schema field type agree on the settings, as I understand it. Regards, Bjarke Den ons. 28. aug.

Re: Query-time synonyms without indexing

2019-08-28 Thread Erick Erickson
Not sure. You have an section and section. Frankly I’m not sure which one will be used for the index-time chain. Why don’t you just try it? change to reload and go. It’d take you 5 minutes and you’d have your answer. Best, Erick > On Aug 28, 2019, at 1:57 AM, Bjarke Buur Mortensen >

Re: Query-time synonyms without indexing

2019-08-27 Thread Bjarke Buur Mortensen
Yes, but isn't that what I am already doing in this case (look at the fieldType in the original mail)? Is there some other way to specify that field type and achieve what I want? Thanks, Bjarke On Tue, Aug 27, 2019, 17:32 Erick Erickson wrote: > You can have separate index and query time

Re: Query-time synonyms without indexing

2019-08-27 Thread Erick Erickson
You can have separate index and query time analysis chains, there are many examples in the stock Solr schemas. Best, Erick > On Aug 27, 2019, at 8:48 AM, Bjarke Buur Mortensen > wrote: > > We have a solr file of type "string". > It turns out that we need to do synonym expansion on query time

Query-time synonyms without indexing

2019-08-27 Thread Bjarke Buur Mortensen
We have a solr file of type "string". It turns out that we need to do synonym expansion on query time in order to account for some changes over time in the values stored in that field. So we have tried introducing a custom fieldType that applies the synonym filter at query time only (see bottom

Re: Solr indexing for unstructured data

2019-08-22 Thread Alexandre Rafalovitch
In Admin UI, there is schema browsing screen: https://lucene.apache.org/solr/guide/8_1/schema-browser-screen.html That shows you all the fields you have, their configuration and their (tokenized) indexed content. This seems to be a good midpoint between indexing and querying. So, I would check

Solr indexing for unstructured data

2019-08-22 Thread amrit pattnaik
Hi , I am a newbie in Solr. I have a scenario wherein the pdf documents with unstructured data have been parsed as text and kept in a separate directory. Now once I build a collection and do indexing using "bin/post -c collection name document name", the document gets indexed and

Re: Slow Indexing scaling issue

2019-08-19 Thread Furkan KAMACI
h some info about why loading Solr with the job > of extracting text is not optimal speed wise: > > https://lucidworks.com/post/indexing-with-solrj/ > > > On Aug 13, 2019, at 12:15 PM, Jan Høydahl wrote: > > > > You May want to review > https://c

Re: Indexing information on number of attachments and their names in EML file

2019-08-14 Thread Zheng Lin Edwin Yeo
nt; filename="file1.pdf" > Content-Transfer-Encoding: base64 > Content-ID: > X-Attachment-Id: f_jpurtpnk0 > > Regards, > Edwin > > On Sat, 3 Aug 2019 at 05:38, Tim Allison wrote: > >> I'd strongly recommend rolling your own ingest code. See Erick'

Re: Slow Indexing scaling issue

2019-08-13 Thread Erick Erickson
Here’s some sample SolrJ code using TIka outside of Solr’s Extracting Request Handler, along with some info about why loading Solr with the job of extracting text is not optimal speed wise: https://lucidworks.com/post/indexing-with-solrj/ > On Aug 13, 2019, at 12:15 PM, Jan Høydahl wr

Re: Slow Indexing scaling issue

2019-08-13 Thread Jan Høydahl
cluster slow and unstable. Better to use Tika or similar on the client side and send text docs to solr. Jan Høydahl > 13. aug. 2019 kl. 16:52 skrev Parmeshwor Thapa : > > Hi, > > We are having some issue on scaling solr indexing. Looking for suggestion. > > Setup : We hav

Slow Indexing scaling issue

2019-08-13 Thread Parmeshwor Thapa
Hi, We are having some issue on scaling solr indexing. Looking for suggestion. Setup : We have two solr cloud (7.4) instances running in separate cloud VMs with an external zookeeper ensemble. We are sending async / non-blocking http request to index documents in solr. 2 cloud VMs ( 4 core

Re: Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Zheng Lin Edwin Yeo
rolling your own ingest code. See Erick's > superb: https://lucidworks.com/post/indexing-with-solrj/ > > You can easily get attachments via the RecursiveParserWrapper, e.g. > > https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parse

Re: Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Tim Allison
I'd strongly recommend rolling your own ingest code. See Erick's superb: https://lucidworks.com/post/indexing-with-solrj/ You can easily get attachments via the RecursiveParserWrapper, e.g. https://github.com/apache/tika/blob/master/tika-parsers/src/test/java/org/apache/tika/parser

Re: Indexing information on number of attachments and their names in EML file

2019-08-02 Thread Jan Høydahl
; > Regards, > Edwin > > On Thu, 1 Aug 2019 at 09:38, Zheng Lin Edwin Yeo > wrote: > >> Hi, >> >> Would like to check, Is there anyway which we can detect the number of >> attachments and their names during indexing of EML files in Solr, and index >&g

Re: Indexing information on number of attachments and their names in EML file

2019-08-01 Thread Zheng Lin Edwin Yeo
names during indexing of EML files in Solr, and index > those information into Solr? > > Currently, Solr is able to use Tika and Tesseract OCR to extract the > contents of the attachments. However, I could not find the information > about the number of attachments in the EML file and what a

Indexing information on number of attachments and their names in EML file

2019-07-31 Thread Zheng Lin Edwin Yeo
Hi, Would like to check, Is there anyway which we can detect the number of attachments and their names during indexing of EML files in Solr, and index those information into Solr? Currently, Solr is able to use Tika and Tesseract OCR to extract the contents of the attachments. However, I could

Re: Solr Geospatial Polygon Indexing/Querying Issue

2019-07-30 Thread David Smiley
On Tue, Jul 30, 2019 at 4:41 PM Sanders, Marshall (CAI - Atlanta) < marshall.sande...@coxautoinc.com> wrote: > I’ll explain the context around the use case we’re trying to solve and > then attempt to respond as best I can to each of your points. What we have > is a list of documents that in our

Re: Solr Geospatial Polygon Indexing/Querying Issue

2019-07-30 Thread Sanders, Marshall (CAI - Atlanta)
true" distErrPct="0.025" maxDistErr="0.09" > distanceUnits="kilometers" > spatialContextFactory="Geo3D"/> > > > We’ve tried indexing some different data, but to keep it as simple as >

Re: Solr Geospatial Polygon Indexing/Querying Issue

2019-07-25 Thread David Smiley
ot;/> > > class="solr.SpatialRecursivePrefixTreeFieldType" >geo="true" distErrPct="0.025" maxDistErr="0.09" > distanceUnits="kilometers" > spatialContextFactory="Geo3D"/> &

Re: Solr Geospatial Polygon Indexing/Querying Issue

2019-07-25 Thread Sanders, Marshall (CAI - Atlanta)
That didn't seem to work either. I think there must be something wrong with how we're indexing/storing the polygon and/or how we've configured the field/querying it. The docs are so sparse on this ( Here's the response: { "responseHeader":{ "status":0, &q

Re: Solr Geospatial Polygon Indexing/Querying Issue

2019-07-25 Thread Ere Maijala
ldType" > >geo="true" distErrPct="0.025" maxDistErr="0.09" > distanceUnits="kilometers" > > spatialContextFactory="Geo3D"/> > > > > > > We’ve tri

Re: Solr Geospatial Polygon Indexing/Querying Issue

2019-07-24 Thread Sanders, Marshall (CAI - Atlanta)
> geo="true" distErrPct="0.025" maxDistErr="0.09" distanceUnits="kilometers" > spatialContextFactory="Geo3D"/> > > > We’ve tried indexing some different data, but

Re: Solr Geospatial Polygon Indexing/Querying Issue

2019-07-24 Thread Ere Maijala
the info from schema: > multiValued="true"/> > > class="solr.SpatialRecursivePrefixTreeFieldType" >geo="true" distErrPct="0.025" maxDistErr="0.09" > distanceUnits="kilometers" >

Solr Geospatial Polygon Indexing/Querying Issue

2019-07-23 Thread Sanders, Marshall (CAI - Atlanta)
in the future so don’t want to rely on it). Here’s the info from schema: We’ve tried indexing some different data, but to keep it as simple as possible we started with a triangle (will eventually add more points to approximate a circle). Here’s an example document that we’ve added just

Solr Geospatial Polygon Indexing/Querying Issue

2019-07-23 Thread Sanders, Marshall (CAI - Atlanta)
in the future so don’t want to rely on it). Here’s the info from schema: We’ve tried indexing some different data, but to keep it as simple as possible we started with a triangle (will eventually add more points to approximate a circle). Here’s an example document that we’ve added just

Re: indexing slow in solr 8.0.0

2019-07-12 Thread Jan Høydahl
You reduce cpu in half and see slower indexing. That is to be expected. But you fail to tell us any real details about your setup, your docs, how you index, how you measure throughput, what your bottleneck is etc. Also note that you get better throughput when indexing for the first time than

Re: SolrCloud indexing triggers merges and timeouts

2019-07-12 Thread Rahul Goswami
Upon further investigation on this issue, I see the below log lines during the indexing process: 2019-06-06 22:24:56.203 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734

indexing slow in solr 8.0.0

2019-07-12 Thread derrick cui
: three servers: 8 core cpu, mem 32G, ssd:300Gindexing 400k only needs 5 minutescollection: 3 shareds/2 replicas/3 nodes now:hardware: three servers: 4 core cpu, mem 32G, ssd:300G indexing 400k, less than 1 documents per minutes collection: 3 shareds/2 replicas/3 nodes anyone what could cause

Re: Indexing nested document: Solr 8.1.1

2019-07-11 Thread sreejith.variyath
Hi, I was using the url *http://localhost:8983/solr/my-core/update/json/docs*. It was wrong. I should use *http://localhost:8983/solr/my-core/update* and its worked. Thanks -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Indexing nested document: Solr 8.1.1

2019-07-11 Thread Sreejith Variyath
Hi, I am trying to index a sample nested document in solr. But I am getting error "ERROR: [doc=1] multiple values encountered for non multiValued field _childDocuments_.id: [2, 3]" I am using ClassicIndexSchemaFactory. So I have defined all the fields in schema.xml. Below my field settings in

Re: SolrCloud indexing triggers merges and timeouts

2019-07-05 Thread Rahul Goswami
; >> Since one thread can only do 1 merge at any given point of time, how > does > >> maxMergeCount being greater than maxThreadCount help anyway? I am having > >> difficulty wrapping my head around this, and would appreciate if you > could > >> help clear it for m

Re: SolrCloud indexing triggers merges and timeouts

2019-07-03 Thread Erick Erickson
trols the number of merges that can be > *scheduled* at the same time. As soon as that number of merges is reached, > the indexing thread(s) will be paused until the number of merges in the > schedule drops below this number. This ensures that no more merges will be > scheduled

<    1   2   3   4   5   6   7   8   9   10   >