Re: DIH with huge data

2018-04-12 Thread Sujay Bawaskar
si...@anant.us > > Anant Corporation > > On Apr 12, 2018, 1:10 PM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>, > wrote: > > Thanks Rahul. Data source is JdbcDataSource with MySQL database. Data > size > > is around 100GB. > > I am not much familiar with spark

Re: DIH with huge data

2018-04-12 Thread Sujay Bawaskar
.si...@gmail.com> wrote: > How much data and what is the database source? Spark is probably the > fastest way. > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > > On Apr 12, 2018, 7:28 AM -0400, Sujay Bawaskar <sujaybawas...@gmail.com>,

DIH with huge data

2018-04-12 Thread Sujay Bawaskar
Hi, We are using DIH with SortedMapBackedCache but as data size increases we need to provide more heap memory to solr JVM. Can we use multiple CSV file instead of database queries and later data in CSV files can be joined using zipper? So bottom line is to create CSV files for each of entity in

Re: Solr OOM Crashes / JVM tuning advice

2018-04-11 Thread Sujay Bawaskar
What is directory factory defined in solrconfig.xml? Your JVM heap should be tuned up with respect to that. How solr is being use, is it more updates and less query or less updates more queries? What is OOM error? Is it frequent GC or Error 12? On Wed, Apr 11, 2018 at 6:05 PM, Adam

Re: Help Needed - Indexing Related

2018-03-27 Thread Sujay Bawaskar
n min intervals. > > Thank you, > Dutt > > > -Original Message- > From: Sujay Bawaskar [mailto:sujaybawas...@gmail.com] > Sent: Tuesday, March 27, 2018 8:32 AM > To: solr-user@lucene.apache.org > Subject: Re: Help Needed - Indexing Related > > Few questions

Re: Help Needed - Indexing Related

2018-03-27 Thread Sujay Bawaskar
Few questions here, Are you using solrj client from java application? What is version of solr? How frequently commit and optimize operation is called from solr client? If commit and optimize are not called from client what is value for solr.autoCommit.maxTime and solr.autoSoftCommit.maxTime? What

Re: Reg:- Indexing MySQL data with Solr

2017-12-01 Thread Sujay Bawaskar
You can use data import handler with cache, its faster. Check document : https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html On Sat, Dec 2, 2017 at 12:21 AM, @Nandan@ wrote: > Hi , > I am working on an

Re: DIH not stop

2017-11-16 Thread Sujay Bawaskar
1638215613-14) [ x:cea2] > o.a.s.c.S.Request [cea2] webapp=/solr path=/dataimport > params={indent=on=json=status&_=1510816148489} status=0 QTime=0 > 2017-11-16 07:21:40.064 INFO (qtp1638215613-14) [ x:cea2] > o.a.s.c.S.Request [cea2] webapp=/solr path=/dataimport > params

Re: DIH not stop

2017-11-15 Thread Sujay Bawaskar
I have experience this problem recently with MySQL and after checking solr.log found that there was a connection timeout from MySQL. Please check solr.log for any Cassandra connection errors. Thanks, Sujay On Thu, Nov 16, 2017 at 12:29 PM, Can Ezgi Aydemir wrote: > Hi

Re: Solr server partial update is very slow

2017-11-12 Thread Sujay Bawaskar
for partial updates. Thanks, Sujay On Mon, Nov 13, 2017 at 1:56 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 11/11/2017 8:17 AM, Sujay Bawaskar wrote: > > Thanks Shawn. Its good to know that OpenSearcher is not causing any issue. >> >> We are good with 15 minutes

Re: Solr server partial update is very slow

2017-11-10 Thread Sujay Bawaskar
. On Fri, Nov 10, 2017 at 12:27 PM, Sujay Bawaskar <sujaybawas...@gmail.com> wrote: > Any reason we get below log even if client does not issue commit or we can > ignore this log? > > Log: 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [ x:collection] > o.a.s.s.Solr

Re: Solr server partial update is very slow

2017-11-09 Thread Sujay Bawaskar
Any reason we get below log even if client does not issue commit or we can ignore this log? Log: 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [ x:collection] o.a.s.s.SolrIndexSearcher Opening [Searcher@7010b1c6[collection] realtime] On Fri, Nov 10, 2017 at 12:06 PM, Sujay Bawaskar

Re: Solr server partial update is very slow

2017-11-09 Thread Sujay Bawaskar
all depends on how long after you > index a document it has to be available for search. > > The settings you have are dangerous. See: > > https://lucidworks.com/2013/08/23/understanding- > transaction-logs-softcommit-and-commit-in-sorlcloud/ > > Best, > Erick > > On Th

Solr server partial update is very slow

2017-11-09 Thread Sujay Bawaskar
Hi, We are getting below log without invoking commit operation after every partial update call. We have configured soft commit and commit time as below. With below configuration we are able to perform 800 partial updates per minutes which I think is very slow. Our Index size is 10GB for this

Re: Issue with delta import

2017-07-26 Thread Sujay Bawaskar
can you please try ${dih.last_index_time} instead of ${dataimporter.last_index_time}. On Wed, Jul 26, 2017 at 2:33 PM, bhargava ravali koganti < ravali@gmail.com> wrote: > Hi, > > I'm trying to integrate Solr and Cassandra. I"m facing problem with delta > import. For every 10 minutes I'm

Re: Parent child documents partial update

2017-07-18 Thread Sujay Bawaskar
ntricacy of the index. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com/lucidworks > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > > On Tue, Jul 18, 2017 at 8:11 AM, Sujay Bawaskar <sujaybawas

Re: Parent child documents partial update

2017-07-17 Thread Sujay Bawaskar
will fetch > the child documents along with it. > > I am not sure whether this can be done with current code or it will be > fixed / improved in the future. > > Amrit Sarkar > Search Engineer > Lucidworks, Inc. > 415-589-9269 > www.lucidworks.com > Twitter http://twitter.com

Parent child documents partial update

2017-07-17 Thread Sujay Bawaskar
Hi, Need a help to understand solr parent child document partial update behaviour. Can we perform partial update on parent document without losing its chiild documents? My observation is that parent child relationship between documents get lost in case partial update is performed on parent. Any

Re: DIH delta import with cache 5.3.1 issue

2017-06-20 Thread Sujay Bawaskar
is behaviour of delta import with caching is not similar to that of full import with caching. If delta query selects 10 elements then its like executing select all query on database for all ten records. Any comment on this behaviour of delta import? On Thu, Mar 16, 2017 at 7:47 PM, Sujay Bawaskar

Re: Data Import

2017-03-17 Thread Sujay Bawaskar
Hi Vishal, As per my experience DIH is the best for RDBMS to solr index. DIH with caching has best performance. DIH nested entities allow you to define simple queries. Also, solrj is good when you want your RDBMS updates make immediately available in solr. DIH full import can be used for index

Re: DIH delta import with cache 5.3.1 issue

2017-03-16 Thread Sujay Bawaskar
le) and then try it on 5.3.1 and 5.4. And then 6.4 if > the problem is still there. If it is still there in 6.4, then we may > have a new bug. > > Regards, >Alex. > > http://www.solr-start.com/ - Resources for Solr users, new and experienced > > > On 16

Re: DIH delta import with cache 5.3.1 issue

2017-03-16 Thread Sujay Bawaskar
you give a bit more details. Do you mean one document gets the > content of multiple documents? And only on delta? > > Regards, > Alex > > On 16 Mar 2017 8:53 AM, "Sujay Bawaskar" <sujay.bawas...@firstfuel.com> > wrote: > > Hi, > > We are using DIH

DIH delta import with cache 5.3.1 issue

2017-03-16 Thread Sujay Bawaskar
Hi, We are using DIH with cache(SortedMapBackedCache) with solr 5.3.1. We have around 2.8 million documents in solr and total index size is 4 GB. DIH delta import is dumping all values of mapped columns to their respective multi valued fields. This is causing size of one solr document upto 2 GB.