Re: HTML sample.html not indexing in Solr 8.8

2021-02-21 Thread Shawn Heisey
On 2/21/2021 3:07 PM, cratervoid wrote: Thanks Shawn, I copied the solrconfig.xml file from the gettingstarted example on 7.7.3 installation to the 8.8.0 installation, restarted the server and it now works. Comparing the two files it looks like as you said this section was left out of the

Re: HTML sample.html not indexing in Solr 8.8

2021-02-21 Thread cratervoid
Thanks Shawn, I copied the solrconfig.xml file from the gettingstarted example on 7.7.3 installation to the 8.8.0 installation, restarted the server and it now works. Comparing the two files it looks like as you said this section was left out of the _default/solrconfig.xml file in version 8.8.0:

Re: HTML sample.html not indexing in Solr 8.8

2021-02-21 Thread cratervoid
Thanks Alex. I copied the solrconfig.xml over from 7.7.3 to the 8.8.0 conf folder and restarted the server. Now indexing works without erroring on sample.html. There is 1K difference between the 2 files so I'll diff them to see what was left out of the 8.8 version. On Sat, Feb 20, 2021 at 4:27

Re: HTML sample.html not indexing in Solr 8.8

2021-02-20 Thread Alexandre Rafalovitch
, Alex. On Sat, 20 Feb 2021 at 17:59, cratervoid wrote: > > I am trying out indexing the exampledocs in the examples folder with the > SimplePostTool on windows 10 using solr 8.8. All the documents index > except sample.html. For that file I get the errors below. I then > downlo

Re: HTML sample.html not indexing in Solr 8.8

2021-02-20 Thread Shawn Heisey
On 2/20/2021 3:58 PM, cratervoid wrote: SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/gettingstarted/update/extract?resource.name=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html=C%3A%5Csolr-8.8.0%5Cexample%5Cexampledocs%5Csample.html

HTML sample.html not indexing in Solr 8.8

2021-02-20 Thread cratervoid
I am trying out indexing the exampledocs in the examples folder with the SimplePostTool on windows 10 using solr 8.8. All the documents index except sample.html. For that file I get the errors below. I then downloaded solr 7.7.3 and indexed the exampledocs folder with no errors, including

Re: Urgent- General Question about document Indexing frequency in solr

2021-02-04 Thread Scott Stults
t; > Looking for some help on document indexing frequency. I am using apache > solr 7.7 and SolrNet library to commit documents to Solr. Summary for this > function is: > // Summary: > // Commits posted documents, blocking until index changes are flushed > to disk and > //

Urgent- General Question about document Indexing frequency in solr

2021-02-03 Thread Manisha Rahatadkar
Hi All Looking for some help on document indexing frequency. I am using apache solr 7.7 and SolrNet library to commit documents to Solr. Summary for this function is: // Summary: // Commits posted documents, blocking until index changes are flushed to disk and // blocking until a new

Re: NRT - Indexing

2021-02-02 Thread Dominique Bejean
Hi, The issue was buildOnCommit=true on a SuggestComponent. Dominique Le mar. 2 févr. 2021 à 00:54, Shawn Heisey a écrit : > On 2/1/2021 12:08 AM, haris.k...@vnc.biz wrote: > > Hope you're doing good. I am trying to configure NRT - Indexing in my > > project. For this

Re: NRT - Indexing

2021-02-01 Thread Shawn Heisey
On 2/1/2021 12:08 AM, haris.k...@vnc.biz wrote: Hope you're doing good. I am trying to configure NRT - Indexing in my project. For this reason, I have configured *autoSoftCommit* to execute every second and *autoCommit* to execute every 5 minutes. Everything works as expected on the dev

Re: NRT - Indexing

2021-02-01 Thread Dominique Bejean
u grep your solr logs on with the "commit' pattern in order to see > > hard and soft commit occurrences ? > > How are you pushing new docs or updates in the collection ? > > > Regards. > > > Dominique > > > > > > Le lun. 1 févr. 2021 à 08:08,

Re: NRT - Indexing

2021-02-01 Thread haris . khan
solr logs on with the "commit' pattern in order to seehard and soft commit occurrences ?How are you pushing new docs or updates in the collection ?Regards.DominiqueLe lun. 1 févr. 2021 à 08:08, haris.k...@vnc.biz a écrit : Hello, Hope you're doing good. I am trying to configure NRT - Indexing

Re: NRT - Indexing

2021-02-01 Thread Dominique Bejean
un. 1 févr. 2021 à 08:08, a écrit : > Hello, > > Hope you're doing good. I am trying to configure NRT - Indexing in my > project. For this reason, I have configured *autoSoftCommit* to execute > every second and *autoCommit* to execute every 5 minutes. Everything > works as expec

Re: NRT - Indexing

2021-02-01 Thread Mr Havercamp
I'm running into the same issue. I've set autoSoftCommit and autoCommit but the speed at which docs are indexed seems to be inconsistent with the settings. I have lowered the autoCommit to a minute but it still takes a few minutes for docs to show after indexing. Soft commit settings also seem

NRT - Indexing

2021-01-31 Thread haris . khan
Hello, Hope you're doing good. I am trying to configure NRT - Indexing in my project. For this reason, I have configuredautoSoftCommitto execute every second andautoCommitto execute every 5 minutes. Everything works as expected on the dev and test server. But on the production server

NRT - Indexing

2021-01-29 Thread haris . khan
Hello, Hope you're doing good. I am trying to configure NRT - Indexing in my project. For this reason, I have configured autoSoftCommit to execute every second and autoCommit to execute every 5 minutes. Everything works as expected on the dev and test server. But on the production server

Re: Re:Interpreting Solr indexing times

2021-01-13 Thread Alessandro Benedetti
I agree, documents may be gigantic or very small, with heavy text analysis or simple strings ... so it's not possible to give an evaluation here. But you could make use of the nightly benchmark to give you an idea of Lucene indexing speed (the engine inside Apache Solr) : http://home.apache.org

RE: [Solr8.7] Indexing only some language ?

2021-01-10 Thread Bruno Mannina
PErfect ! Thanks ! -Message d'origine- De : xiefengchang [mailto:fengchang_fi...@163.com] Envoyé : dimanche 10 janvier 2021 04:50 À : solr-user@lucene.apache.org Objet : Re:[Solr8.7] Indexing only some language ? Take a look at the document here: https://lucene.apache.org/solr/guide/8_7

Re:Interpreting Solr indexing times

2021-01-10 Thread xiefengchang
it's hard to answer your question without your solrconfig.xml, managed-schema(or schema.xml), and good to have some log snippet as well~ At 2021-01-07 21:28:00, "ufuk yılmaz" wrote: >Hello all, > >I have been looking at our SolrCloud indexing performance stat

Re:[Solr8.7] Indexing only some language ?

2021-01-09 Thread xiefengchang
Take a look at the document here: https://lucene.apache.org/solr/guide/8_7/dynamic-fields.html#dynamic-fields here's the point: "a field that does not match any explicitly defined fields can be matched with a dynamic field." so I guess the priority is quite clear~ At

[Solr8.7] Indexing only some language ?

2021-01-09 Thread Bruno Mannina
Hello, I would like to define in my schema.xml some text_xx fields. I have patent titles in several languages. Only 6 of them (EN, IT, FR, PT, ES, DE) interest me. I know how to define these 6 fields, I use text_en, text_it etc. i.e. for English language: But I have more than 6

Interpreting Solr indexing times

2021-01-07 Thread ufuk yılmaz
Hello all, I have been looking at our SolrCloud indexing performance statistics and trying to make sense of the numbers. We are using a custom Flume sink and sending updates to Solr (8.4) using SolrJ. I know these stuff depend on a lot of things but can you tell me if these statistics

Re: Indexing performance 7.3 vs 8.7

2020-12-23 Thread Bram Van Dam
On 23/12/2020 16:00, Ron Buchanan wrote: > - both run Java 1.8, but 7.3 is running HotSpot and 8.7 is running > OpenJDK (and a bit newer) If you're using G1GC, you probably want to give Java 11 a go. It's an easy thing to test, and it's had a positive impact for us. Your mileage may

Indexing performance 7.3 vs 8.7

2020-12-23 Thread Ron Buchanan
(this is long, just trying to be thorough) I'm working on upgrading from Solr 7.3 to Solr 8.7 and I am seeing a significant drop in indexing throughput during a full index reload - from ~1300 documents per second to ~450 documents/sec Background: VM hosts (these are configured identically

Re: SOLR 8.6.0 date Indexing Issues.

2020-11-20 Thread Jörn Franke
utilities that allow you do to this transformation easily. > Am 20.11.2020 um 21:50 schrieb Fiz N : > > Hello Experts, > > I am having issues with indexing Date field in SOLR 8.6.0. I am indexing > from MongoDB. In MongoDB the Format is as follows > > > * &qu

SOLR 8.6.0 date Indexing Issues.

2020-11-20 Thread Fiz N
Hello Experts, I am having issues with indexing Date field in SOLR 8.6.0. I am indexing from MongoDB. In MongoDB the Format is as follows * "R_CREATION_DATE" : "12-Jul-18", "R_MODIFY_DATE" : "30-Apr-19", * In my Managed Schema I have the following e

Solr 7.7 Indexing issue

2020-09-30 Thread Manisha Rahatadkar
Hello all We are using Apache Solr 7.7 on Windows platform. The data is synced to Solr using Solr.Net commit. The data is being synced to SOLR in batches. The document size is very huge (~0.5GB average) and solr indexing is taking long time. Total document size is ~200GB. As the solr commit

Re: Exclude a folder/directory from indexing

2020-08-28 Thread Walter Underwood
er() and filterbyname(). > Thus you may wish to consider them or equivalents for inclusion in your > system, whatever that may be. > Thanks, > Joe D. > > On 27/08/2020 20:32, Alexandre Rafalovitch wrote: >> If you are indexing from Drupal into Solr, that's the question for &

Re: Exclude a folder/directory from indexing

2020-08-28 Thread Joe Doupnik
system, whatever that may be.     Thanks,     Joe D. On 27/08/2020 20:32, Alexandre Rafalovitch wrote: If you are indexing from Drupal into Solr, that's the question for Drupal's solr module. If you are doing it some other way, which way are you doing it? bin/post command? Most likely

Re: Exclude a folder/directory from indexing

2020-08-27 Thread Alexandre Rafalovitch
If you are indexing from Drupal into Solr, that's the question for Drupal's solr module. If you are doing it some other way, which way are you doing it? bin/post command? Most likely this is not the Solr question, but whatever you have feeding data into Solr. Regards, Alex. On Thu, 27 Aug

Exclude a folder/directory from indexing

2020-08-27 Thread Staley, Phil R - DCF
Can you or how do you exclude a specific folder/directory from indexing in SOLR version 7.x or 8.x? Also our CMS is Drupal 8 Thanks, Phil Staley DCF Webmaster 608 422-6569 phil.sta...@wisconsin.gov

Re: SOLR indexing takes longer time

2020-08-18 Thread Walter Underwood
hreaded and >> deprecated anyway. >> 3. Minor point - consider whether you need to index everything every >> time or just the deltas. >> 4. Upgrade Solr anyway, not for speed reasons but because that's a very >> old version you're running. >> >> HTH >> >&g

Re: SOLR indexing takes longer time

2020-08-18 Thread David Hastings
anyway, not for speed reasons but because that's a very > old version you're running. > > HTH > > Charlie > > On 17/08/2020 19:22, Abhijit Pawar wrote: > > Hello, > > > > We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / > > rep

Re: SOLR indexing takes longer time

2020-08-18 Thread Charlie Hull
every time or just the deltas. 4. Upgrade Solr anyway, not for speed reasons but because that's a very old version you're running. HTH Charlie On 17/08/2020 19:22, Abhijit Pawar wrote: Hello, We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / replicas and just single

Re: SOLR indexing takes longer time

2020-08-17 Thread Aroop Ganguly
Adding on to what others have said, indexing speed in general is largely affected by the parallelism and isolation you can give to each node. Is there a reason why you cannot have more than 1 shard? If you have 5 node cluster, why not have 5 shards, maxshardspernode=1 replica=1 is ok. You should

Re: SOLR indexing takes longer time

2020-08-17 Thread Shawn Heisey
On 8/17/2020 12:22 PM, Abhijit Pawar wrote: We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / replicas and just single core. It takes almost 3.5 hours to index that data. I am using a data import handler to import data from the mongo database. Is there something we can do

Re: SOLR indexing takes longer time

2020-08-17 Thread Walter Underwood
while you are indexing. If it is under 50%, the bottleneck is MongoDB and single-threaded indexing. For another check, run that same query in a regular database client and time it. The Solr indexing will never be faster than that. wunder Walter Underwood wun...@wunderwood.org http

Re: SOLR indexing takes longer time

2020-08-17 Thread Abhijit Pawar
... On Mon, Aug 17, 2020 at 1:32 PM Divye Handa wrote: > Can you share the dih configuration you are using for same? > > On Mon, 17 Aug, 2020, 23:52 Abhijit Pawar, wrote: > > > Hello, > > > > We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / >

Re: SOLR indexing takes longer time

2020-08-17 Thread Jörn Franke
t; Hello, > > We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / > replicas and just single core. > It takes almost 3.5 hours to index that data. > I am using a data import handler to import data from the mongo database. > > Is there something we c

Re: SOLR indexing takes longer time

2020-08-17 Thread Divye Handa
Can you share the dih configuration you are using for same? On Mon, 17 Aug, 2020, 23:52 Abhijit Pawar, wrote: > Hello, > > We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / > replicas and just single core. > It takes almost 3.5 hours to index that data. >

SOLR indexing takes longer time

2020-08-17 Thread Abhijit Pawar
Hello, We are indexing some 200K plus documents in SOLR 5.4.1 with no shards / replicas and just single core. It takes almost 3.5 hours to index that data. I am using a data import handler to import data from the mongo database. Is there something we can do to reduce the time taken to index

RE: Time-out errors while indexing (Solr 7.7.1)

2020-07-07 Thread Kommu, Vinodh K.
Hi Eric, Toke, Can you please look at the details shared in my trail email & respond with your suggestions/feedback? Thanks & Regards, Vinodh From: Kommu, Vinodh K. Sent: Monday, July 6, 2020 4:58 PM To: solr-user@lucene.apache.org Subject: RE: Time-out errors while indexing (So

Re: Out of memory errors with Spatial indexing

2020-07-06 Thread David Smiley
bug causing an exponential > > explosion of needed grid squares when you have polygons super-close to > the > > pole. Might you try S2PrefixTree instead? I forget if this would fix it > > or not by itself. For indexing non-point data, I recommend > > class="solr.

RE: Time-out errors while indexing (Solr 7.7.1)

2020-07-06 Thread Kommu, Vinodh K.
tal Thanks & Regards, Vinodh -Original Message- From: Erick Erickson Sent: Saturday, July 4, 2020 7:07 PM To: solr-user@lucene.apache.org Subject: Re: Time-out errors while indexing (Solr 7.7.1) ATTENTION: External Email – Be Suspicious of Attachments, Links and Requests for L

Re: Out of memory errors with Spatial indexing

2020-07-06 Thread Sunil Varma
S2PrefixTree instead? I forget if this would fix it > or not by itself. For indexing non-point data, I recommend > class="solr.RptWithGeometrySpatialField" which internally is based off a > combination of a course grid and storing the original vector geometry fo

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-04 Thread Mad have
you’re running at all unless that 13B is a round number. If you keep adding > documents, your installation will shortly, at best, stop accepting new > documents for indexing. At worst you’ll start seeing weird errors and > possibly corrupt indexes and have to re-index everything fr

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-04 Thread Erick Erickson
, your installation will shortly, at best, stop accepting new documents for indexing. At worst you’ll start seeing weird errors and possibly corrupt indexes and have to re-index everything from scratch. You’ve backed yourself in to a pretty tight corner here. You either have to re-index

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-04 Thread Mad have
the indexing become slow and I also have same impression that the size of the collection is creating this issue. Appreciate if you can suggests any solution on this. Regards, Madhava Sent from my iPhone > On 3 Jul 2020, at 23:30, Erick Erickson wrote: > > Oops, I transposed that

Re: Out of memory errors with Spatial indexing

2020-07-03 Thread David Smiley
Hi Sunil, Your shape is at a pole, and I'm aware of a bug causing an exponential explosion of needed grid squares when you have polygons super-close to the pole. Might you try S2PrefixTree instead? I forget if this would fix it or not by itself. For indexing non-point data, I recommend class

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Erick Erickson
t; swapping, how much of your I/O is just because Lucene can’t >>> hold all the parts of the index it needs in memory at once? Lucene >>> uses MMapDirectory to hold the index and you may well be >>> swapping, see: >>> >>> https://blog.thetaphi.de/2012/0

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Erick Erickson
it.html >> >> But my guess is that you’ve just reached a tipping point. You say: >> >> "From last 2-3 weeks we have been noticing either slow indexing or timeout >> errors while indexing” >> >> So have you been continually adding more

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Mad have
ry at once? Lucene > uses MMapDirectory to hold the index and you may well be > swapping, see: > > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html > > But my guess is that you’ve just reached a tipping point. You say: > > "From last 2-3 w

Out of memory errors with Spatial indexing

2020-07-03 Thread Sunil Varma
We are seeing OOM errors when trying to index some spatial data. I believe the data itself might not be valid but it shouldn't cause the Server to crash. We see this on both Solr 7.6 and Solr 8. Below is the input that is causing the error. { "id": "bad_data_1", "spatialwkt_srpt": "LINESTRING

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Erick Erickson
may well be swapping, see: https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html But my guess is that you’ve just reached a tipping point. You say: "From last 2-3 weeks we have been noticing either slow indexing or timeout errors while indexing” So have you been contin

Re: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Toke Eskildsen
e than write operations like 100:1 ratio, is this expected during > indexing or solr nodes are doing any other operations like syncing? Are you saying that there are 100 times more read operations when you are indexing? That does not sound too unrealistic as the disk cache might be fille

RE: Time-out errors while indexing (Solr 7.7.1)

2020-07-03 Thread Kommu, Vinodh K.
Anyone has any thoughts or suggestions on this issue? Thanks & Regards, Vinodh From: Kommu, Vinodh K. Sent: Thursday, July 2, 2020 4:46 PM To: solr-user@lucene.apache.org Subject: Time-out errors while indexing (Solr 7.7.1) Hi, We are performing QA performance testing on couple of collect

Re: Solr 8.5.2 indexing issue

2020-07-02 Thread gnandre
It seems that the issue is not with reference_url field itself. There is one copy field which has the reference_url field as source and another field called url_path as destination. This destination field url_path has the following field type definition.

Time-out errors while indexing (Solr 7.7.1)

2020-07-02 Thread Kommu, Vinodh K.
Hi, We are performing QA performance testing on couple of collections which holds 2 billion and 3.5 billion docs respectively. Indexing happens from a separate client using solrJ which uses 10 thread and batch size 1000. From last 2-3 weeks we have been noticing either slow indexing or timeout

Re: Solr 8.5.2 indexing issue

2020-06-28 Thread Erick Erickson
How are you sending this to Solr? I just tried 8.5, submitting that doc through the admin UI and it works fine. I defined “asset_id” with as the same type as your reference_url field. And does the log on the Solr node that tries to index this give any more info? Best, Erick > On Jun 27, 2020,

Solr 8.5.2 indexing issue

2020-06-27 Thread gnandre
Hi, I have the following document which fails to get indexed. { "asset_id":"add-ons:576deefef7453a9189aa039b66500eb2", "reference_url":"modeling-a-high-speed-backplane-part-3-4-port-s-parameters-to-differential-tdr-and-tdt.html"} I am not sure what is so special about the content in

Re: Prevent Re-indexing if Doc Fields are Same

2020-06-26 Thread Walter Underwood
it does an in-place update without deletion. But the > problem is I don't know if the document is present or I'm indexing it the > first time. > > Is there a way to prevent re-indexing if other fields are the same? > > *P.S. I'm looking for a solution that doesn't require looking up if doc is > present in the Collection or not.*

Prevent Re-indexing if Doc Fields are Same

2020-06-26 Thread Anshuman Singh
the fields, it deletes the document and re-index it. But if I just "set" the "LASTUPDATETIME" field (non-indexed, non-stored, docValue field), it does an in-place update without deletion. But the problem is I don't know if the document is present or I'm indexing it the first time.

Indexing error when using Category Routed Alias

2020-06-09 Thread Tom Evans
Hi all 1. Setup simple 1 node solrcloud test setup using docker-compose, solr:8.5.2, zookeeper:3.5.8. 2. Upload a configset 3. Create two collections, one standard collection, one CRA, both using the same configset legacy: action=CREATE=products_old=products=true=1=-1 CRA: { "create-alias":

Re: Indexing PDF on SOLR 8.5

2020-06-07 Thread Fiz N
Thanks Erick... On Sun, Jun 7, 2020 at 1:50 PM Erick Erickson wrote: > https://lucidworks.com/post/indexing-with-solrj/ > > > > On Jun 7, 2020, at 3:22 PM, Fiz N wrote: > > > > Thanks Jorn and Erick. > > > > Hi Erick, looks like the skeletal SOLRJ progra

Re: Indexing PDF on SOLR 8.5

2020-06-07 Thread Erick Erickson
https://lucidworks.com/post/indexing-with-solrj/ > On Jun 7, 2020, at 3:22 PM, Fiz N wrote: > > Thanks Jorn and Erick. > > Hi Erick, looks like the skeletal SOLRJ program attachment is missing. > > Thanks > Fiz > > On Sun, Jun 7, 2020 at 12:20 PM Erick E

Re: Indexing PDF on SOLR 8.5

2020-06-07 Thread Fiz N
Thanks Jorn and Erick. Hi Erick, looks like the skeletal SOLRJ program attachment is missing. Thanks Fiz On Sun, Jun 7, 2020 at 12:20 PM Erick Erickson wrote: > Here’s a skeletal SolrJ program using Tika as another alternative. > > Best, > Erick > > > On Jun 7, 2020, at 2:06 PM, Jörn Franke

Re: Indexing PDF on SOLR 8.5

2020-06-07 Thread Erick Erickson
Here’s a skeletal SolrJ program using Tika as another alternative. Best, Erick > On Jun 7, 2020, at 2:06 PM, Jörn Franke wrote: > > You have to write an external application that creates multiple threads, > parses the PDFs and index them in Solr. Ideally you parse the PDFs once and > store

Re: Indexing PDF on SOLR 8.5

2020-06-07 Thread Jörn Franke
You have to write an external application that creates multiple threads, parses the PDFs and index them in Solr. Ideally you parse the PDFs once and store the resulting text on some file system and then index it. Reason is that if you upgrade to two major versions of Solr you might need to

Indexing PDF on SOLR 8.5

2020-06-07 Thread Fiz N
Hello SOLR Experts, I am working on a POC to Index millions of PDF documents present in Multiple Folder in fileshare. Could you please let me the best practices and step to implement it. Thanks Fiz Nadiyal.

Re: Not all EML files are indexing during indexing

2020-06-03 Thread Charlie Hull
I think the OP is indexing flat files, not web pages (but otherwise, I agree with you that Scrapy is great - I know some of the people behind it too and they're a good bunch). Charlie On 02/06/2020 16:41, Walter Underwood wrote: On Jun 2, 2020, at 7:40 AM, Charlie Hull wrote: If it was me

Re: Not all EML files are indexing during indexing

2020-06-02 Thread Walter Underwood
> On Jun 2, 2020, at 7:40 AM, Charlie Hull wrote: > > If it was me I'd probably build a standalone indexer script in Python that > did the file handling, called out to a separate Tika service for extraction, > posted to Solr. I would do the same thing, and I would base that script on Scrapy

Re: Not all EML files are indexing during indexing

2020-06-02 Thread Charlie Hull
probably build a standalone indexer script in Python that did the file handling, called out to a separate Tika service for extraction, posted to Solr. Cheers Charlie On 02/06/2020 14:48, Zheng Lin Edwin Yeo wrote: Hi Charlie, The main code that is doing the indexing is from the Solr's SimplePos

Re: Not all EML files are indexing during indexing

2020-06-02 Thread Zheng Lin Edwin Yeo
Hi Charlie, The main code that is doing the indexing is from the Solr's SimplePostTools, but we have done some modification to it. The walking through a folder is done by PowerShell script, the extracting of the content from .eml file is from Tika that comes with Solr, and the images in the .eml

Re: Not all EML files are indexing during indexing

2020-06-01 Thread Charlie Hull
Hi Edwin, What code is actually doing the indexing? AFAIK Solr doesn't include any code for actually walking a folder, extracting the content from .eml files and pushing this data into its index, so I'm guessing you've built something external? Charlie On 01/06/2020 02:13, Zheng Lin Edwin

Not all EML files are indexing during indexing

2020-05-31 Thread Zheng Lin Edwin Yeo
Hi, I am running this on Solr 7.6.0 Currently I have a situation whereby there's more than 2 million EML file in a folder, and the folder is constantly updating the EML files with the latest information and adding new EML files. When I do the indexing, it is suppose to index the new EML files

Re: Indexing huge data onto solr

2020-05-26 Thread Erick Erickson
n table 6) > (table 7 join table 8) > > > > Do you have any recommendations for it to run multiple sql’s and make it as > single solr document that can be sent over solrJ for indexing? > > Say parent entity has 100 documents, should I iterate over ea

RE: Indexing huge data onto solr

2020-05-25 Thread Srinivas Kashyap
table 2) (table 3 join table 4) (table 5 join table 6) (table 7 join table 8) Do you have any recommendations for it to run multiple sql’s and make it as single solr document that can be sent over solrJ for indexing? Say parent entity has 100

Re: Indexing huge data onto solr

2020-05-22 Thread matthew sporleder
PM Erick Erickson wrote: > > You have a lot more control over the speed and form of importing data if > you just do the initial load in SolrJ. Here’s an example, taking the Tika > parts out is easy: > > https://lucidworks.com/post/indexing-with-solrj/ > > It’s especially inst

Re: Indexing huge data onto solr

2020-05-22 Thread Erick Erickson
You have a lot more control over the speed and form of importing data if you just do the initial load in SolrJ. Here’s an example, taking the Tika parts out is easy: https://lucidworks.com/post/indexing-with-solrj/ It’s especially instructive to comment out just the call to CloudSolrClient.add

Indexing huge data onto solr

2020-05-22 Thread Srinivas Kashyap
Hi All, We are runnnig solr 8.4.1. We have a database table which has more than 100 million of records. Till now we were using DIH to do full-import on the tables. But for this table, when we do full-import via DIH it is taking more than 3-4 days to complete and also it consumes fair bit of

Re: Different indexing times for two different collections with different data sizes

2020-05-20 Thread Erick Erickson
issues at significantly than 2B. Note that when segments are merged, the internal IDs get reassigned... Indexing scales pretty linearly with the number of shards, _assuming_ you’re adding more hardware. To really answer the question you need to look at what the bottleneck is on your current

Different indexing times for two different collections with different data sizes

2020-05-20 Thread Kommu, Vinodh K.
Hi, Recently we had noticed that one of the largest collection (shards = 6 ; replication factor =3) which holds up to 1TB of data & nearly 3.2 billion of docs is taking longer time to index than it used to before. To see the indexing time difference, we created another collection using lar

Re: nested entities and DIH indexing time

2020-05-14 Thread Shawn Heisey
On 5/14/2020 3:14 PM, matthew sporleder wrote:> Can a non-nested entity write into existing docs, or do they always> have to produce document-per-entity? This is the only thing I found on this topic, and it is on a third-party website, so I can't say much about how accurate it is:

Re: nested entities and DIH indexing time

2020-05-14 Thread matthew sporleder
On Thu, May 14, 2020 at 4:46 PM Shawn Heisey wrote: > > On 5/14/2020 9:36 AM, matthew sporleder wrote: > > It appears that adding entities to my entities in my data import > > config is slowing down my import process by a lot. Is there a good > > way to speed this up? I see the ID's are

Re: nested entities and DIH indexing time

2020-05-14 Thread Shawn Heisey
On 5/14/2020 9:36 AM, matthew sporleder wrote: It appears that adding entities to my entities in my data import config is slowing down my import process by a lot. Is there a good way to speed this up? I see the ID's are individually queried instead of using IN() or similar normal techniques to

nested entities and DIH indexing time

2020-05-14 Thread matthew sporleder
It appears that adding entities to my entities in my data import config is slowing down my import process by a lot. Is there a good way to speed this up? I see the ID's are individually queried instead of using IN() or similar normal techniques to make things faster. Just looking for some tips.

Re: Indexing Korean

2020-05-13 Thread ART GALLERY
; Regards, > Markus > > > > -Original message- > > From:Audrey Lorberfeld - audrey.lorberf...@ibm.com > > > Sent: Friday 1st May 2020 17:34 > > To: solr-user@lucene.apache.org > > Subject: Indexing Korean > > >

RE: Indexing Korean

2020-05-04 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
--Original message- > From:Audrey Lorberfeld - audrey.lorberf...@ibm.com > Sent: Friday 1st May 2020 17:34 > To: solr-user@lucene.apache.org > Subject: Indexing Korean > > Hi All, > > My team would like to index Korean, but it l

RE: Indexing Korean

2020-05-01 Thread Markus Jelsma
erf...@ibm.com > Sent: Friday 1st May 2020 17:34 > To: solr-user@lucene.apache.org > Subject: Indexing Korean > > Hi All, > > My team would like to index Korean, but it looks like Solr OOTB does not have > explicit support for Korean. If any of you have schema pipelines

Indexing Korean

2020-05-01 Thread Audrey Lorberfeld - audrey.lorberf...@ibm.com
Hi All, My team would like to index Korean, but it looks like Solr OOTB does not have explicit support for Korean. If any of you have schema pipelines you could share for your Korean documents, I would love to see them! I'm assuming I would just use some combination of the OOTB CJK

Re: Solr indexing with Tika DIH - ZeroByteFileException

2020-04-23 Thread Charlie Hull
If users can upload any PDF, including broken or huge ones, and some cause a Tika error, you should decouple Tika from Solr and run it as a separate process to extract text before indexing with Solr. Otherwise some of what is uploaded *will* break Solr. https://lucidworks.com/post/indexing

Re: Solr indexing with Tika DIH - ZeroByteFileException

2020-04-22 Thread ravi kumar amaravadi
Hi, Iam also facing same issue. Does anyone have any update/soulution how to fix this issue as part DIH? Thanks. Regards, Ravi kumar -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Indexing data from multiple data sources

2020-04-20 Thread Charlie Hull
with this link https://sematext.com/opensee/m/Solr/eHNlswSd1vD6AF?subj=RE+Indexing+data+from+multiple+data+sources As it is open to the world, what we are requesting here is, could you please remove that post as-soon-as possible before it creates any sucurity issues for us. Your help is very very

Re: Indexing data from multiple data sources

2020-04-18 Thread RaviKiran Moola
+Indexing+data+from+multiple+data+sources As it is open to the world, what we are requesting here is, could you please remove that post as-soon-as possible before it creates any sucurity issues for us. Your help is very very appreciable!!! FYI. Here I'm attaching the below screenshot [cid

Indexing data from multiple data sources(CSV, RDBMS)

2020-04-18 Thread Shravan Kumar Bolla
Hi, I am working on indexing data from multiple data sources using a single collection. I specified data sources information in the data-config file and also updated managed schema.xml by adding the fields from all the data sources by specifying the common unique key across all the sources

Re: Indexing data from multiple data sources

2020-04-17 Thread Jörn Franke
What does your Solr.log say? Any error ? > Am 17.04.2020 um 20:22 schrieb RaviKiran Moola > : > >  > Hi, > > Greetings!!! > > We are working on indexing data from multiple data sources (MySQL & MSSQL) in > a single collection. We specified data sour

RE: Indexing data from multiple data sources

2020-04-17 Thread RaviKiran Moola
Hi, Greetings!!! We are working on indexing data from multiple data sources (MySQL & MSSQL) in a single collection. We specified data source details like connection details along with the required fields for both data sources in a single data config file, along with specified required fi

Re: Inconsistent / confusing documentation on indexing nested documents.

2020-04-03 Thread Chris Hostetter
: Is the documentation wrong or have I misunderstood it? The documentation is definitely wrong, thanks for pointing this out... https://issues.apache.org/jira/browse/SOLR-14383 -Hoss http://www.lucidworks.com/

Inconsistent / confusing documentation on indexing nested documents.

2020-04-03 Thread Peter Pimley
Hi, The page "Indexing Nested Documents" has an XML example showing two different ways of adding nested documents: https://lucene.apache.org/solr/guide/8_5/indexing-nested-documents.html#xml-examples The text says: "It illustrates two styles of adding child documents: the firs

Debugging indexing timeouts

2020-03-02 Thread fredsearch157
Hi all, A couple of months ago, I migrated my solr deployment off of some legacy hardware (old spinning disks), and onto much newer hardware (SSD's, newer processors). While I am seeing much improved search performance since this move, I am also seeing intermittent indexing timeouts for 10-15

  1   2   3   4   5   6   7   8   9   10   >