Re: Data Import Handler (DIH) - Installing and running

2020-12-23 Thread Erick Erickson
Have you done what the message says and looked at your Solr log? If so, what information is there? > On Dec 23, 2020, at 5:13 AM, DINSD | SPAutores > wrote: > > Hi, > > I'm trying to install the package "data-import-handler", since it was > discontinued from core SolR distro. > > https://git

Re: Data Import Blocker - Solr

2020-12-19 Thread Shawn Heisey
On 12/18/2020 12:03 AM, basel altameme wrote: While trying to Import & Index data from MySQL DB custom view i am facing the error below: Data Config problem: The value of attribute "query" associated with an element type "entity" must not contain the '<' character. Please note that in my SQL st

Re: Data Import Blocker - Solr

2020-12-18 Thread Erick Erickson
Have you tried escaping that character? > On Dec 18, 2020, at 2:03 AM, basel altameme > wrote: > > Dear, > While trying to Import & Index data from MySQL DB custom view i am facing the > error below: > Data Config problem: The value of attribute "query" associated with an > element type "enti

Re: data import handler deprecated?

2020-11-30 Thread Dmitri Maziuk
On 11/30/2020 7:50 AM, David Smiley wrote: Yes, absolutely to what Eric said. We goofed on news / release highlights on how to communicate what's happening in Solr. From a Solr insider point of view, we are "deprecating" because strictly speaking, the code isn't in our codebase any longer. Fro

Re: data import handler deprecated?

2020-11-30 Thread David Smiley
Yes, absolutely to what Eric said. We goofed on news / release highlights on how to communicate what's happening in Solr. From a Solr insider point of view, we are "deprecating" because strictly speaking, the code isn't in our codebase any longer. From a user point of view (the audience of news

Re: data import handler deprecated?

2020-11-30 Thread Eric Pugh
You don’t need to abandon DIH right now…. You can just use the Github hosted version…. The more people who use it, the better a community it will form around it!It’s a bit chicken and egg, since no one is actively discussing it, submitting PR’s etc, it may languish. If you use it, and

Re: data import handler deprecated?

2020-11-29 Thread Dmitri Maziuk
On 11/29/2020 10:32 AM, Erick Erickson wrote: And I absolutely agree with Walter that the DB is often where the bottleneck lies. You might be able to use multiple threads and/or processes to query the DB if that’s the case and you can find some kind of partition key. IME the difficult part has

Re: data import handler deprecated?

2020-11-29 Thread Erick Erickson
If you like Java instead of Python, here’s a skeletal program: https://lucidworks.com/post/indexing-with-solrj/ It’s simple and single-threaded, but could serve as a basis for something along the lines that Walter suggests. And I absolutely agree with Walter that the DB is often where the bottle

Re: data import handler deprecated?

2020-11-29 Thread Walter Underwood
I recommend building an outboard loader, like I did a dozen years ago for Solr 1.3 (before DIH) and did again recently. I’m glad to send you my Python program, though it reads from a JSONL file, not a database. Run a loop fetching records from a database. Put each record into a synchronized (threa

Re: data import handler deprecated?

2020-11-28 Thread matthew sporleder
I went through the same stages of grief that you are about to start but (luckily?) my core dataset grew some weird cousins and we ended up writing our own indexer to join them all together/do partial updates/other stuff beyond DIH. It's not difficult to upload docs but is definitely slower so far.

Re: data import handler deprecated?

2020-11-28 Thread Dmitri Maziuk
On 11/28/2020 5:48 PM, matthew sporleder wrote: ... The bottom of that github page isn't hopeful however :) Yeah, "works with MariaDB" is a particularly bad way of saying "BYO JDBC JAR" :) It's a more general queston though, what is the path forward for users who with data in two places?

Re: data import handler deprecated?

2020-11-28 Thread matthew sporleder
https://solr.cool/#utilities -> https://github.com/rohitbemax/dataimporthandler You can import it in the many new/novel ways to add things to a solr install and it should work like always (apparently). The bottom of that github page isn't hopeful however :) On Sat, Nov 28, 2020 at 5:21 PM Dmitri

Re: Data Import Handler - Concurrent Entity Importing

2020-05-13 Thread ART GALLERY
check out the videos on this website TROO.TUBE don't be such a sheep/zombie/loser/NPC. Much love! https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219 On Tue, May 5, 2020 at 1:58 PM Mikhail Khludnev wrote: > > Hello, James. > > DataImportHandler has a lock preventing concurrent exe

Re: Data Import Handler - Concurrent Entity Importing

2020-05-05 Thread Mikhail Khludnev
Hello, James. DataImportHandler has a lock preventing concurrent execution. If you need to run several imports in parallel at the same core, you need to duplicate "/dataimport" handlers definition in solrconfig.xml. Thus, you can run them in parallel. Regarding schema, I prefer the latter but mile

Re: data-import-handler for solr-7.5.0

2018-10-02 Thread Alexandre Rafalovitch
Data, IM & Analytics > > > > Lautrupparken 40-42, DK-2750 Ballerup > E-mail m...@kmd.dk Web www.kmd.dk > Mobil +4525571418 > > -Oprindelig meddelelse- > Fra: Alexandre Rafalovitch > Sendt: 2. oktober 2018 18:18 > Til: solr-user > Emne: Re: data-imp

Re: data-import-handler for solr-7.5.0

2018-10-02 Thread Alexandre Rafalovitch
Admin UI for DIH will show you the config file read. So, if nothing is there, the path is most likely the issue You can also provide or update the configuration right in UI if you enable debug. Finally, the config file is reread on every invocation, so you don't need to restart the core after cha

Re: data-import-handler for solr-7.5.0

2018-10-02 Thread Jan Høydahl
> url="C:/Users/z6mhq/Desktop/data_import/nh_test.xml" Have you tried url="C:\\Users\\z6mhq/Desktop\\data_import\\nh_test.xml" ? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 2. okt. 2018 kl. 17:15 skrev Martin Frank Hansen (MHQ) : > > Hi, > > I am having some pr

Re: Data Import Handler with Solr Source behind Load Balancer

2018-09-14 Thread Emir Arnautović
Hi Thomas, Is this SolrCloud or Solr master-slave? Do you update index while indexing? Did you check if all your instances behind LB are in sync if you are using master-slave? My guess would be that DIH is using cursors to read data from another Solr. If you are using multiple Solr instances beh

Re: Data Import from Command Line

2018-08-20 Thread Adam Blank
Thank you both for the responses. I was able to get the import working through telnet, and I'll see if I can get the post utility working as that seems like a better option. Thanks, Adam On Mon, Aug 20, 2018, 2:04 PM Alexandre Rafalovitch wrote: > Admin UI just hits Solr for a particular URL wi

Re: Data Import from Command Line

2018-08-20 Thread Alexandre Rafalovitch
Admin UI just hits Solr for a particular URL with specific parameters. You could totally call it from the command line, but it _would_ need to be an HTTP client of some sort. You could encode all of the parameters into the DIH (or a new) handler, it is all defined in solrconfig.xml (/dataimport is

Re: Data Import from Command Line

2018-08-20 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Adam, On 8/20/18 1:45 PM, Adam Blank wrote: > I'm running Solr 5.5.0 on AIX, and I'm wondering if there's a way > to import the index from the command line instead of using the > admin console? I don't have the ability to use a HTTP client such > a

Re: Data import batch mode for delta

2018-04-17 Thread Shawn Heisey
On 4/16/2018 7:32 PM, gadelkareem wrote: I cannot complain cuz it actually worked well for me so far but.. I still do not understand if Solr already paginates the results from the full import, why not do the same for the delta. It is almost the same query: `select id from t where t.lastmod > ${s

Re: Data import batch mode for delta

2018-04-16 Thread gadelkareem
Thanks Shawn. I cannot complain cuz it actually worked well for me so far but.. I still do not understand if Solr already paginates the results from the full import, why not do the same for the delta. It is almost the same query: `select id from t where t.lastmod > ${solrTime}` `select * from t w

Re: Data import batch mode for delta

2018-04-05 Thread Shawn Heisey
On 4/5/2018 7:31 PM, gadelkareem wrote: Why the deltaImportQuery uses "where id='${dataimporter.id}'" instead of something like where id IN ('${dataimporter.id})' Because there's only one value for that property. If the deltaQuery returns a million rows, then deltaImportQuery is going to be e

RE: data import class not found

2017-08-31 Thread Steve Pruitt
I just tried putting the solr-dataimporthandler-6.6.0.jar in server/solr/lib and I got past the problem. I still don't understand why not found in /dist -Original Message- From: Steve Pruitt [mailto:bpru...@opentext.com] Sent: Thursday, August 31, 2017 3:05 PM To: solr-user@lucene.apach

Re: Data Import

2017-03-17 Thread Mike Thomsen
If Solr is down, then adding through SolrJ would fail as well. Kafka's new API has some great features for this sort of thing. The new client API is designed to be run in a long-running loop where you poll for new messages with a certain amount of defined timeout (ex: consumer.poll(1000) for 1s) So

Re: Data Import

2017-03-17 Thread OTH
Are Kafka and SQS interchangeable? (The latter does not seem to be free.) @Wunder: I'm assuming, that updating to Solr would fail if Solr is unavailable not just if posting via say a DB trigger, but probably also if trying to post through SolrJ? (Which is what I'm using for now.) So, even if us

RE: Data Import

2017-03-17 Thread Liu, Daphne
@lucene.apache.org Subject: Re: Data Import Hi Daphne, Are you using DSE? Thanks & Regards, Vishal On Fri, Mar 17, 2017 at 7:40 PM, Liu, Daphne wrote: > I just want to share my recent project. I have successfully sent all > our EDI documents to Cassandra 3.7 clusters using Solr 6.3 Data Imp

Re: Data Import

2017-03-17 Thread vishal jain
Streaming the data through kafka would be a good option if near real time data indexing is the key requirement. In our application the RDBMS data is populated by an ETL job periodically so we don't need real time data indexing for now. Cheers, Vishal On Fri, Mar 17, 2017 at 10:30 PM, Erick Ericks

Re: Data Import

2017-03-17 Thread Walter Underwood
That fails if Solr is not available. To avoid dropping updates, you need some kind of persistent queue. We use Amazon SQS for our incremental updates. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 17, 2017, at 10:09 AM, OTH wrote: > > Could

Re: Data Import

2017-03-17 Thread vishal jain
> > CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL > 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.928.1448 / > daphne@cevalogistics.com > > > > -Original Message- > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >

Re: Data Import

2017-03-17 Thread OTH
Could the database trigger not just post the change to solr? On Fri, Mar 17, 2017 at 10:00 PM, Erick Erickson wrote: > Or set a trigger on your RDBMS's main table to put the relevant > information in a different table (call it EVENTS) and have your SolrJ > consult the EVENTS table periodically.

Re: Data Import

2017-03-17 Thread Erick Erickson
Or set a trigger on your RDBMS's main table to put the relevant information in a different table (call it EVENTS) and have your SolrJ consult the EVENTS table periodically. Essentially you're using the EVENTS table as a queue where the trigger is the producer and the SolrJ program is the consumer.

Re: Data Import

2017-03-17 Thread vishal jain
Thanks to all of you for the valuable inputs. Being on J2ee platform I also felt using solrJ in a multi threaded environment would be a better choice to index RDBMS data into SolrCloud. I will try with a scheduler triggered micro service to do the job using SolrJ. Regards, Vishal On Fri, Mar 17,

Re: Data Import

2017-03-17 Thread Alexandre Rafalovitch
One assumes by hooking into the same code that updates RDBMS, as opposed to be reverse engineering the changes from looking at the DB content. This would be especially the case for Delete changes. Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced O

Re: Data Import

2017-03-17 Thread OTH
> > Also, solrj is good when you want your RDBMS updates make immediately > available in solr. How can SolrJ be used to make RDBMS updates immediately available? Thanks On Fri, Mar 17, 2017 at 2:28 PM, Sujay Bawaskar wrote: > Hi Vishal, > > As per my experience DIH is the best for RDBMS to solr

RE: Data Import

2017-03-17 Thread Liu, Daphne
s.com -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Friday, March 17, 2017 9:54 AM To: solr-user Subject: Re: Data Import I feel DIH is much better for prototyping, even though people do use it in production. If you do want to use DIH, you may benefit

Re: Data Import

2017-03-17 Thread Alexandre Rafalovitch
I feel DIH is much better for prototyping, even though people do use it in production. If you do want to use DIH, you may benefit from reviewing the DIH-DB example I am currently rewriting in https://issues.apache.org/jira/browse/SOLR-10312 (may need to change luceneMatchVersion in solrconfig.xml f

Re: Data Import

2017-03-17 Thread Shawn Heisey
On 3/17/2017 3:04 AM, vishal jain wrote: > I am new to Solr and am trying to move data from my RDBMS to Solr. I know the > available options are: > 1) Post Tool > 2) DIH > 3) SolrJ (as ours is a J2EE application). > > I want to know what is the recommended way for Data import in production > envir

Re: Data Import

2017-03-17 Thread Sujay Bawaskar
Hi Vishal, As per my experience DIH is the best for RDBMS to solr index. DIH with caching has best performance. DIH nested entities allow you to define simple queries. Also, solrj is good when you want your RDBMS updates make immediately available in solr. DIH full import can be used for index all

Re: Data Import Handler on 6.4.1

2017-03-15 Thread Walter Underwood
Also, upgrade to 6.4.2. There are serious performance problems in 6.4.0 and 6.4.1. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 15, 2017, at 12:05 PM, Liu, Daphne > wrote: > > For Solr 6.3, I have to move mine to > ../solr-6.3.0/server/s

RE: Data Import Handler on 6.4.1

2017-03-15 Thread Liu, Daphne
For Solr 6.3, I have to move mine to ../solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib. If you are using jetty. Kind regards, Daphne Liu BI Architect - Matrix SCM CEVA Logistics / 10751 Deerwood Park Blvd, Suite 200, Jacksonville, FL 32256 USA / www.cevalogistics.com T 904.564.1192 / F 904.

Re: Data Import Handler, also "Real Time" index updates

2017-03-05 Thread Damien Kamerman
You could configure the dataimporthandler to not delete at the start (either do a delta or set the preimportdeltequery), and set a postimportdeletequery if required. On Saturday, 4 March 2017, Alexandre Rafalovitch wrote: > Commit is index global. So if you have overlapping timelines and commit

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Alexandre Rafalovitch
Commit is index global. So if you have overlapping timelines and commit is issued, it will affect all changes done to that point. So, the aliases may be better for you. You could potentially also reload a cure with changes solrconfig.XML settings, but that's heavy on caches. Regards, Alex On

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
> > You have indicated that you have a way to avoid doing updates during the > full import. Because of this, you do have another option that is likely > much easier for you to implement: Set the "commitWithin" parameter on > each update request. This works almost identically to autoSoftCommit,

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Shawn Heisey
On 3/3/2017 10:17 AM, Sales wrote: > I am not sure how best to handle this. We use the data import handle to > re-sync all our data on a daily basis, takes 1-2 hours depending on system > load. It is set up to commit at the end, so, the old index remains until it’s > done, and, we lose no access

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
> On Mar 3, 2017, at 11:30 AM, Erick Erickson wrote: > > One way to handle this (presuming SolrCloud) is collection aliasing. > You create two collections, c1 and c2. You then have two aliases. when > you start "index" is aliased to c1 and "search" is aliased to c2. Now > do your full import to

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Erick Erickson
One way to handle this (presuming SolrCloud) is collection aliasing. You create two collections, c1 and c2. You then have two aliases. when you start "index" is aliased to c1 and "search" is aliased to c2. Now do your full import to "index" (and, BTW, you'd be well advised to do at least a hard co

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Sales
> > On Mar 3, 2017, at 11:22 AM, Alexandre Rafalovitch wrote: > > On 3 March 2017 at 12:17, Sales > wrote: >> When we enabled those, during the index, the data disappeared since it kept >> soft committing during the import process, > > This part does not quite make sense. Could you expand on

Re: Data Import Handler, also "Real Time" index updates

2017-03-03 Thread Alexandre Rafalovitch
On 3 March 2017 at 12:17, Sales wrote: > When we enabled those, during the index, the data disappeared since it kept > soft committing during the import process, This part does not quite make sense. Could you expand on this "data disappeared" part to understand what the issue is. The main issue

Re: Data Import Handlers not working after upgrade from 6.3.0 to 6.4.0

2017-01-25 Thread Shawn Heisey
On 1/25/2017 4:06 PM, Dan Scarf wrote: > I upgraded Solr 6.3.0 this morning to 6.4.0. All seemed good according to > the logs but this afternoon we discovered that the DataImport tabs in our > Collections now say: > > 'Sorry, no dataimport-handler defined!'. This is a bug that only applies to 6.4

Re: Data Import Handler - maximum?

2016-12-12 Thread Shawn Heisey
On 12/11/2016 8:00 PM, Brian Narsi wrote: > We are using Solr 5.1.0 and DIH to build index. > > We are using DIH with clean=true and commit=true and optimize=true. > Currently retrieving about 10.5 million records in about an hour. > > I will like to find from other member's experiences as to how l

Re: Data Import Handler - maximum?

2016-12-12 Thread Bernd Fehling
Am 12.12.2016 um 04:00 schrieb Brian Narsi: > We are using Solr 5.1.0 and DIH to build index. > > We are using DIH with clean=true and commit=true and optimize=true. > Currently retrieving about 10.5 million records in about an hour. > > I will like to find from other member's experiences as to

Re: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-26 Thread Marek Ščevlík
I ran my jar application beside solr running instance where I want to trigger a DIH import. I tried this approach: String urlString1 = "http://localhost:8983/solr/db/dataimport";; SolrClient solr1 = new HttpSolrClient.Builder(urlString1).build(); ModifiableSolrParams params = new ModifiableSolrPar

Re: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-26 Thread Erick Erickson
on a quick glance, and not having tried this myself... this seems wrong. You're setting a URL parameter "db": params.set("db","/dataimport"); that's equivalent to a URL like http://localhost:8983/solr&db=/dataimport you'd want: http://localhost:8983/solr/db/dataimport?command=full-import I thin

Re: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-26 Thread Marek Ščevlík
Actually to be honest I realized that I only needed to trigger a data import handler from a jar file. Previously this was done in earlier versions via the SolrServer object. Now I am thinking if this is OK?: String urlString1 = "http://localhost:8983/solr/";; SolrClient solr1 = new HttpSolrClient.

Re: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-25 Thread Marek Ščevlík
I forgot to mention I am creating a jar file beside of a running solr 6.3 instance to which I am hoping to attach with java via the SolrDispatchFilter to get at the cores and so then I could work with data in code. 2016-11-25 19:31 GMT+01:00 Marek Ščevlík : > Hi Daniel. Thanks for a reply. I won

Re: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-25 Thread Marek Ščevlík
Hi Daniel. Thanks for a reply. I wonder is it now still possibly with release of Solr 6.3 to get hold of a running instance of the jetty server that is part of the solution? I found some code for previous versions where it was captured with this code and one could then obtain cores for a running so

Re: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-18 Thread Alexandre Rafalovitch
Is your goal to still index into Solr? It was not clear. If yes, then it has been discussed quite a bit. The challenge is that DIH is integrated into AdminUI, which makes it easier to see the progress and set some flags. Plus the required jars are loaded via solrconfig.xml, just like all other ext

RE: Data Import Request Handler isolated into its own project - any suggestions?

2016-11-18 Thread Davis, Daniel (NIH/NLM) [C]
Marek, I've wanted to do something like this in the past as well. However, a rewrite that supports the same XML syntax might be better. There are several problems with the design of the Data Import Handler that make it not quite suitable: - Not designed for Multi-threading - Bad implementati

RE: Data import handler in techproducts example

2016-07-07 Thread Brooks Chuck (FCA)
Hello Jonas, Did you figure this out? Dr. Chuck Brooks 248-838-5070 -Original Message- From: Jonas Vasiliauskas [mailto:jonas.vasiliaus...@yahoo.com.INVALID] Sent: Saturday, July 02, 2016 11:37 AM To: solr-user@lucene.apache.org Subject: Data import handler in techproducts example He

Re: Data import handler in techproducts example

2016-07-02 Thread Ahmet Arslan
Hi Jonas, Search for the solr-dataimporthandler-*.jar place it under a lib directory (same level as the solr.xml file) along with the mysql jdbc driver (mysql-connector-java-*.jar) Please see: https://cwiki.apache.org/confluence/display/solr/Lib+Directives+in+SolrConfig On Saturday, July 2,

Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread Erick Erickson
There's nothing saying you have to highlight fields you search on. So you can specify hl.fl to be the "normal" (perhaps stored-only) fields and still search on the uber-field. Best, Erick On Thu, May 26, 2016 at 2:08 PM, kostali hassan wrote: > I did it , I copied all my dynamic field into text

Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
I did it , I copied all my dynamic field into text field and it work great. just one question even if I copied text into content and the inverse for get highliting , thats not work ,they are another way to get highliting? thank you eric 2016-05-26 18:28 GMT+01:00 Erick Erickson : > And, you can c

Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread Erick Erickson
And, you can copy all of the fields into an "uber field" using the copyField directive and just search the "uber field". Best, Erick On Thu, May 26, 2016 at 7:35 AM, kostali hassan wrote: > thank you it make sence . > have a good day > > 2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu : > >>

Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread kostali hassan
thank you it make sence . have a good day 2016-05-26 15:31 GMT+01:00 Siddhartha Singh Sandhu : > The schema.xml/managed_schema defines the default search field as `text`. > > You can make all fields that you want searchable type `text`. > > On Thu, May 26, 2016 at 10:23 AM, kostali hassan < > med

Re: "data import handler : import data from sql database :how to search in all fields"

2016-05-26 Thread Siddhartha Singh Sandhu
The schema.xml/managed_schema defines the default search field as `text`. You can make all fields that you want searchable type `text`. On Thu, May 26, 2016 at 10:23 AM, kostali hassan wrote: > I import data from sql databases with DIH . I am looking for serch term in > all fields not by field.

Re: Data Import Handler - Multivalued fields - splitBy

2016-02-27 Thread saravanan1980
It's resolved after changing my column name..its all case sensitive... -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-Handler-Multivalued-fields-splitBy-tp4243667p4260301.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data Import Handler - Multivalued fields - splitBy

2016-02-27 Thread saravanan1980
I am also having the same problem. Have you resolved this issue? "response": { "numFound": 3, "start": 0, "docs": [ { "genre": [ "Action|Adventure", "Action", "Adventure" ] }, { "genre": [ "Drama|Suspens

Re: Data Import Handler Usage

2016-02-16 Thread vidya
Hi Dataimport section in web ui page still shows me that no data import handler is defined. And no data is being added to my new collection. -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-Handler-Usage-tp4257518p4257576.html Sent from the Solr - User mailing li

Re: Data Import Handler Usage

2016-02-16 Thread Erik Hatcher
The "other" collection (destination of the import) is the collection where that data import handler definition resides. Erik > On Feb 16, 2016, at 01:54, vidya wrote: > > Hi > > I have gone through documents to define data import handler in solr. But i > couldnot implement it. > I have cr

Re: Data Import Handler - autoSoftCommit and autoCommit

2016-02-08 Thread Susheel Kumar
You can start with one of the suggestions from this link based on your indexing and query load. https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Thanks, Susheel On Mon, Feb 8, 2016 at 10:15 AM, Troy Edwards wrote: > We are running the

Re: Data Import Handler - autoSoftCommit and autoCommit

2016-02-08 Thread Rajesh Hazari
we have this for a collection which updated every 3mins with min of 500 documents and once in a day of 10k documents in start of the day ${solr.autoCommit.maxTime:30} 1 true true ${solr.autoSoftCommit.maxTime:6000} As per solr documentation, If

Re: Data Import Handler takes different time on different machines

2016-02-03 Thread Troy Edwards
While researching the space on the servers, I found that log files from Sept 2015 are still there. These are solr_gc_log_datetime and solr_log_datetime. Is the default logging for Solr ok for production systems or does it need to be changed/tuned? Thanks, On Tue, Feb 2, 2016 at 2:04 PM, Troy Edw

Re: Data Import Handler takes different time on different machines

2016-02-02 Thread Troy Edwards
That is help! Thank you for the thoughts. On Tue, Feb 2, 2016 at 12:17 PM, Erick Erickson wrote: > Scratch that installation and start over? > > Really, it sounds like something is fundamentally messed up with the > Linux install. Perhaps something as simple as file paths, or you have > old ja

Re: Data Import Handler takes different time on different machines

2016-02-02 Thread Erick Erickson
Scratch that installation and start over? Really, it sounds like something is fundamentally messed up with the Linux install. Perhaps something as simple as file paths, or you have old jars hanging around that are mis-matched. Or someone manually deleted files from the Solr install. Or your disk f

Re: Data Import Handler takes different time on different machines

2016-02-02 Thread Troy Edwards
Rerunning the Data Import Handler again on the the linux machine has started producing some errors and warnings: On the node on which DIH was started: WARN SolrWriter Error creating document : SolrInputDocument org.apache.solr.common.SolrException: No registered leader was found after waiting fo

Re: Data Import Handler takes different time on different machines

2016-02-01 Thread Erick Erickson
The first thing I'd be looking at is how I the JDBC batch size compares between the two machines. AFAIK, Solr shouldn't notice the difference, and since a large majority of the development is done on Linux-based systems, I'd be surprised if this was worse than Windows, which would lead me to t

Re: Data Import Handler takes different time on different machines

2016-02-01 Thread Troy Edwards
Sorry, I should explain further. The Data Import Handler had been running for a while retrieving only about 15 records from the database. Both in development env (windows) and linux machine it took about 3 mins. The query has been changed and we are now trying to retrieve about 10 million reco

Re: Data Import Handler takes different time on different machines

2016-02-01 Thread Erick Erickson
What happens if you run just the SQL query from the windows box and from the linux box? Is there any chance that somehow the connection from the linux box is just slower? Best, Erick On Mon, Feb 1, 2016 at 6:36 PM, Alexandre Rafalovitch wrote: > What are you importing from? Is the source and Sol

Re: Data Import Handler takes different time on different machines

2016-02-01 Thread Alexandre Rafalovitch
What are you importing from? Is the source and Solr machine collocated in the same fashion on dev and prod? Have you tried running this on a Linux dev machine? Perhaps your prod machine is loaded much more than a dev. Regards, Alex. Newsletter and resources for Solr beginners and intermed

Re: Data import issue

2015-12-25 Thread Alexandre Rafalovitch
Do you have a full stack trace? A bit hard to help without that. On 24 Dec 2015 2:54 pm, "Midas A" wrote: > Hi , > > > Please provide the steps to resolve the issue. > > > com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: > Communications link failure during rollback(). Transa

Re: Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Brian Narsi
That was it! Thank you! On Fri, Dec 4, 2015 at 3:13 PM, Dyer, James wrote: > Brian, > > Be sure to have... > > transformer="RegexTransformer" > > ...in your tag. It’s the RegexTransformer class that looks > for "splitBy". > > See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer

RE: Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Dyer, James
Brian, Be sure to have... transformer="RegexTransformer" ...in your tag. It’s the RegexTransformer class that looks for "splitBy". See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer for more information. James Dyer Ingram Content Group -Original Message- From: Br

Re: Data Import Handler / Backup indexes

2015-11-23 Thread Jeff Wartes
The backup/restore approach in SOLR-5750 and in solrcloud_manager is really just that - copying the index files. On backup, it saves your index directories, and on restore, it puts them in the data dir, moves a pointer for the current index dir, and opens a new searcher. Both are mostly just wrapp

Re: Data Import Handler / Backup indexes

2015-11-22 Thread Erick Erickson
These are just Lucene indexes. There's the Cloud backup and restore that is being worked on. But if the index is static (i.e. not being indexed to), simply copying the data/index (well, actually the whole data index and subdirs) directory will backup and restore it. Copying the index directory bac

Re: Data Import Handler / Backup indexes

2015-11-21 Thread Brian Narsi
What are the caveats regarding the copy of a collection? At this time DIH takes only about 10 minutes. So in case of accidental delete we can just re-run the DIH. The reason I am thinking about backup is just in case records are deleted accidentally and the DIH cannot be run because the database i

Re: Data Import Handler / Backup indexes

2015-11-17 Thread Jeff Wartes
https://github.com/whitepages/solrcloud_manager supports 5.x, and I added some backup/restore functionality similar to SOLR-5750 in the last release. Like SOLR-5750, this backup strategy requires a shared filesystem, but note that unlike SOLR-5750, I haven’t yet added any backup functionality for

Re: Data Import Handler / Backup indexes

2015-11-17 Thread Brian Narsi
Sorry I forgot to mention that we are using SolrCloud 5.1.0. On Tue, Nov 17, 2015 at 12:09 PM, KNitin wrote: > afaik Data import handler does not offer backups. You can try using the > replication handler to backup data as you wish to any custom end point. > > You can also try out : https://gi

Re: Data Import Handler / Backup indexes

2015-11-17 Thread KNitin
afaik Data import handler does not offer backups. You can try using the replication handler to backup data as you wish to any custom end point. You can also try out : https://github.com/bloomreach/solrcloud-haft. This helps backup solr indices across clusters. On Tue, Nov 17, 2015 at 7:08 AM, Br

Re: data import extremely slow

2015-11-07 Thread Yangrui Guo
Thanks for your kind reply. I tried using both sqlentityprocessor and set batchSize to -1but didn't get any improvement. It'd be helpful if I can see data import handler's log. On Saturday, November 7, 2015, Alexandre Rafalovitch wrote: > LoL. Of course I meant SolrJ. I had to misspell the most

Re: Data import handler not indexing all data

2015-11-07 Thread Yangrui Guo
Yes the id is unique. If I only select distinct id,count(id) I get the same results. However I found this is more likely a MySQL issue. I created a new table called director1 and ran query "insert into director1 select * from director" I got only 287041 results inserted, which was the same as Solr.

Re: data import extremely slow

2015-11-07 Thread Alexandre Rafalovitch
LoL. Of course I meant SolrJ. I had to misspell the most important word of the hundreds I wrote in this thread :-) Thank you Erick for the correction. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 7 November 2015 at 19:18, Erick Erickson wro

Re: data import extremely slow

2015-11-07 Thread Erick Erickson
Alexandre, did you mean SolrJ? Here's a way to get started https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/ Best, Erick On Sat, Nov 7, 2015 at 2:22 PM, Alexandre Rafalovitch wrote: > Have you thought of just using Solr. Might be faster than troubleshooting > DIH for complex scenarios

Re: Data import handler not indexing all data

2015-11-07 Thread Alexandre Rafalovitch
That's not quite the question I asked. Do a distinct on 'id' only in the database itself. If your ids are NOT unique, you need to create a composite or a virtual id for Solr. Because whatever your solrconfig.xml say is uniqueKey will be used to deduplicate the documents. If you have 10 documents wi

Re: Data import handler not indexing all data

2015-11-07 Thread Yangrui Guo
Hi thanks for the continued support. I'm really worried as my project deadline is near. It was 1636549 in MySQL vs 287041 in Solr. I put select distinct in the beginning of the query because IMDB doesn't have a table for cast & crew. It puts movie and person and their roles into one huge table 'cas

Re: Data import handler not indexing all data

2015-11-07 Thread Alexandre Rafalovitch
Just to get the paranoid option out of the way, is 'id' actually the column that has unique ids in your database? If you do "select distinct id from imdb.director" - how many items do you get? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-s

Re: data import extremely slow

2015-11-07 Thread Alexandre Rafalovitch
Have you thought of just using Solr. Might be faster than troubleshooting DIH for complex scenarios. On 7 Nov 2015 3:39 pm, "Yangrui Guo" wrote: > I found multiple strange things besides the slowness. I performed count(*) > in MySQL but only one-fifth of the records were imported. Also sometimes

Re: data import extremely slow

2015-11-07 Thread Yangrui Guo
I found multiple strange things besides the slowness. I performed count(*) in MySQL but only one-fifth of the records were imported. Also sometimes dataimporthandler either doesn't import at all or only imports a portion of the table. How can I debug the importer? On Saturday, November 7, 2015, Y

Re: data import extremely slow

2015-11-07 Thread Yangrui Guo
I just realized that not everything was ok. Three child entities were not imported. Had set batchSize to -1 but again solr was stuck :( On Fri, Nov 6, 2015 at 3:11 PM, Yangrui Guo wrote: > Thanks for the reply. I just removed CacheKeyLookUp and CachedKey and used > WHERE clause instead. Everythi

  1   2   3   >