date:20140126

Re: What is the right way to bring a failed SolrCloud node back online?

2014-01-26 Thread Mark Miller

We are working on a new mode (which should become the default) where ZooKeeper 
will be treated as the truth for a cluster.

This mode will be able to handle situations like this - if the cluster state 
says a core should exist on a node and it doesn’t, it will be created on 
startup.

The way things work currently is this kind of hybrid situation where the truth 
is partly in ZooKeeper partly on each node. This is not ideal at all.

I think this new mode is very important, and it will be coming shortly. Until 
then, I’d recommend writing this logic externally as you suggest (I’ve seen it 
done before).

- Mark

http://about.me/markrmiller

On Jan 24, 2014, at 12:01 PM, Nathan Neulinger nn...@neulinger.org wrote:

 I have an environment where new collections are being added frequently 
 (isolated per customer), and the backup is virtually guaranteed to be missing 
 some of them.
 
 As it stands, bringing up the restored/out-of-date instance results in thos 
 collections being stuck in 'Recovering' state, because the cores don't exist 
 on the resulting server. This can also be extended to the case of restoring a 
 completely blank instance.
 
 Is there any way to tell SolrCloud Try recreating any missing cores for this 
 collection based on where you know they should be located.
 
 Or do I need to actually determine a list of cores (..._shardX_replicaY) and 
 trigger the core creates myself, at which point I gather that it will start 
 recovery for each of them?
 
 -- Nathan
 
 
 Nathan Neulinger   nn...@neulinger.org
 Neulinger Consulting   (573) 612-1412

Re: Solr server requirements for 100+ million documents

2014-01-26 Thread Erick Erickson

Dumping the raw data would probably be a good idea. I guarantee you'll be
re-indexing the data several times as you change the schema to accommodate
different requirements...

But it may also be worth spending some time figuring out why the DB
access is slow. Sometimes one can tune that.

If you go the SolrJ route, you also have the possibility of setting up N clients
to work simultaneously, sometimes that'll help.

FWIW,
Erick

On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar
susheel.ku...@thedigitalgroup.net wrote:
 Hi Kranti,

 Attach are the solrconfig  schema xml for review. I did run indexing with 
 just few fields (5-6 fields) in schema.xml  keeping the same db config but 
 Indexing almost still taking similar time (average 1 million records 1 hr) 
 which confirms that the bottleneck is in the data acquisition which in our 
 case is oracle database. I am thinking to not use dataimporthandler / jdbc to 
 get data from Oracle but to rather dump data somehow from oracle using SQL 
 loader and then index it. Any thoughts?

 Thnx

 -Original Message-
 From: Kranti Parisa [mailto:kranti.par...@gmail.com]
 Sent: Saturday, January 25, 2014 12:08 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 can you post the complete solrconfig.xml file and schema.xml files to review 
 all of your settings that would impact your indexing performance.

 Thanks,
 Kranti K. Parisa
 http://www.linkedin.com/in/krantiparisa



 On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar  
 susheel.ku...@thedigitalgroup.net wrote:

 Thanks, Svante. Your indexing speed using db seems to really fast. Can
 you please provide some more detail on how you are indexing db
 records. Is it thru DataImportHandler? And what database? Is that
 local db?  We are indexing around 70 fields (60 multivalued) but data
 is not populated always in all fields. The average size of document is in 
 5-10 kbs.

 -Original Message-
 From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf
 Of svante karlsson
 Sent: Friday, January 24, 2014 5:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 I just indexed 100 million db docs (records) with 22 fields (4
 multivalued) in 9524 sec using libcurl.
 11 million took 763 seconds so the speed drops somewhat with
 increasing dbsize.

 We write 1000 docs (just an arbitrary number) in each request from two
 threads. If you will be using solrcloud you will want more writer threads.

 The hardware is a single cheap hp DL320E GEN8 V2 1P E3-1220V3 with one
 SSD and 32GB and the solr runs on ubuntu 13.10 inside a esxi virtual machine.

 /svante




 2014/1/24 Susheel Kumar susheel.ku...@thedigitalgroup.net

  Thanks, Erick for the info.
 
  For indexing I agree the more time is consumed in data acquisition
  which in our case from Database.  For indexing currently we are
  using the manual process i.e. Solr dashboard Data Import but now
  looking to automate.  How do you suggest to automate the index part.
  Do you recommend to use SolrJ or should we try to automate using Curl?
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Friday, January 24, 2014 2:59 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr server requirements for 100+ million documents
 
  Can't be done with the information you provided, and can only be
  guessed at even with more comprehensive information.
 
  Here's why:
 
 
  http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-
  we
  -dont-have-a-definitive-answer/
 
  Also, at a guess, your indexing speed is so slow due to data
  acquisition; I rather doubt you're being limited by raw Solr indexing.
  If you're using SolrJ, try commenting out the
  server.add() bit and running again. My guess is that your indexing
  speed will be almost unchanged, in which case it's the data
  acquisition process is where you should concentrate efforts. As a
  comparison, I can index 11M Wikipedia docs on my laptop in 45
  minutes without any attempts at parallelization.
 
 
  Best,
  Erick
 
  On Fri, Jan 24, 2014 at 12:10 PM, Susheel Kumar 
  susheel.ku...@thedigitalgroup.net wrote:
   Hi,
  
   Currently we are indexing 10 million document from database (10 db
   data
  entities)  index size is around 8 GB on windows virtual box.
  Indexing in one shot taking 12+ hours while indexing parallel in
  separate cores  merging them together taking 4+ hours.
  
   We are looking to scale to 100+ million documents and looking for
  recommendation on servers requirements on below parameters for a
  Production environment. There can be 200+ users performing search
  same
 time.
  
   No of physical servers (considering solr cloud) Memory requirement
   Processor requirement (# cores) Linux as OS oppose to windows
  
   Thanks in advance.
   Susheel

Re: What is the right way to bring a failed SolrCloud node back online?

2014-01-26 Thread Nathan Neulinger

Thanks, yeah, I did just that - and sent the script in on SOLR-5665 if anyone wants a copy. Script is trivial, but
you're welcome to stick it (trivial) in contrib or something if it's at all useful to anyone.

-- Nathan

On 01/26/2014 08:28 AM, Mark Miller wrote:

We are working on a new mode (which should become the default) where ZooKeeper
will be treated as the truth for a cluster.

This mode will be able to handle situations like this - if the cluster state
says a core should exist on a node and it doesn’t, it will be created on
startup.

The way things work currently is this kind of hybrid situation where the truth
is partly in ZooKeeper partly on each node. This is not ideal at all.

I think this new mode is very important, and it will be coming shortly. Until
then, I’d recommend writing this logic externally as you suggest (I’ve seen it
done before).

- Mark

http://about.me/markrmiller

On Jan 24, 2014, at 12:01 PM, Nathan Neulinger nn...@neulinger.org wrote:

I have an environment where new collections are being added frequently
(isolated per customer), and the backup is virtually guaranteed to be missing
some of them.

As it stands, bringing up the restored/out-of-date instance results in thos
collections being stuck in 'Recovering' state, because the cores don't exist on
the resulting server. This can also be extended to the case of restoring a
completely blank instance.

Is there any way to tell SolrCloud Try recreating any missing cores for this
collection based on where you know they should be located.

Or do I need to actually determine a list of cores (..._shardX_replicaY) and
trigger the core creates myself, at which point I gather that it will start
recovery for each of them?

-- Nathan

Nathan Neulinger nn...@neulinger.org
Neulinger Consulting (573) 612-1412

Fwd: Search Engine Framework decision

2014-01-26 Thread rashmi maheshwari

Hi,

I want to creating a POC to search INTRANET along with documents uploaded
on intranet. Documents(PDF, excel, word document, text files, images,
videos) are also exists on SHAREPOINT. sharepoint has Authentication access
at module level(folder level).

My interanet website is http://myintranet/ http://sparsh/ . and
Sharepoint url is different. Documents also exist in file folders.

I have below queries:
A) Which crawler framework do I use along with Solr for this POC, Nutch
or Apache ManifoldCF?

B) Is it possible to crawl Sharepoint documents usiing Nutch? If yes, only
configuration level change would make this possible? or I have to write
code to parse and send to solr?

C) Which version of Solr+nutch+MCF should be used? because nutch version
has dependency on solr version. wold nutch 1.7 works properly with solr
4.6.0?
-- 
Rashmi
Be the change that you want to see in this world!




-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org

RE: Solr server requirements for 100+ million documents

2014-01-26 Thread Susheel Kumar

Thank you Erick for your valuable inputs. Yes, we have to re-index data again  
again. I'll look into possibility of tuning db access.  

On SolrJ and automating the indexing (incremental as well as one time) I want 
to get your opinion on below two points. We will be indexing separate sets of 
tables with similar data structure

- Should we use SolrJ and write Java programs that can be scheduled to trigger 
indexing on demand/schedule based. 

- Is using SolrJ a better idea even for searching than using SolrNet? As our 
frontend is in .Net so we started using SolrNet but I am afraid down the road 
when we scale/support SolrClod using SolrJ is better?


Thanks
Susheel
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, January 26, 2014 8:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Dumping the raw data would probably be a good idea. I guarantee you'll be 
re-indexing the data several times as you change the schema to accommodate 
different requirements...

But it may also be worth spending some time figuring out why the DB access is 
slow. Sometimes one can tune that.

If you go the SolrJ route, you also have the possibility of setting up N 
clients to work simultaneously, sometimes that'll help.

FWIW,
Erick

On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar 
susheel.ku...@thedigitalgroup.net wrote:
 Hi Kranti,

 Attach are the solrconfig  schema xml for review. I did run indexing with 
 just few fields (5-6 fields) in schema.xml  keeping the same db config but 
 Indexing almost still taking similar time (average 1 million records 1 hr) 
 which confirms that the bottleneck is in the data acquisition which in our 
 case is oracle database. I am thinking to not use dataimporthandler / jdbc to 
 get data from Oracle but to rather dump data somehow from oracle using SQL 
 loader and then index it. Any thoughts?

 Thnx

 -Original Message-
 From: Kranti Parisa [mailto:kranti.par...@gmail.com]
 Sent: Saturday, January 25, 2014 12:08 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 can you post the complete solrconfig.xml file and schema.xml files to review 
 all of your settings that would impact your indexing performance.

 Thanks,
 Kranti K. Parisa
 http://www.linkedin.com/in/krantiparisa



 On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar  
 susheel.ku...@thedigitalgroup.net wrote:

 Thanks, Svante. Your indexing speed using db seems to really fast. 
 Can you please provide some more detail on how you are indexing db 
 records. Is it thru DataImportHandler? And what database? Is that 
 local db?  We are indexing around 70 fields (60 multivalued) but data 
 is not populated always in all fields. The average size of document is in 
 5-10 kbs.

 -Original Message-
 From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf 
 Of svante karlsson
 Sent: Friday, January 24, 2014 5:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 I just indexed 100 million db docs (records) with 22 fields (4
 multivalued) in 9524 sec using libcurl.
 11 million took 763 seconds so the speed drops somewhat with 
 increasing dbsize.

 We write 1000 docs (just an arbitrary number) in each request from 
 two threads. If you will be using solrcloud you will want more writer 
 threads.

 The hardware is a single cheap hp DL320E GEN8 V2 1P E3-1220V3 with 
 one SSD and 32GB and the solr runs on ubuntu 13.10 inside a esxi virtual 
 machine.

 /svante




 2014/1/24 Susheel Kumar susheel.ku...@thedigitalgroup.net

  Thanks, Erick for the info.
 
  For indexing I agree the more time is consumed in data acquisition 
  which in our case from Database.  For indexing currently we are 
  using the manual process i.e. Solr dashboard Data Import but now 
  looking to automate.  How do you suggest to automate the index part.
  Do you recommend to use SolrJ or should we try to automate using Curl?
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Friday, January 24, 2014 2:59 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr server requirements for 100+ million documents
 
  Can't be done with the information you provided, and can only be 
  guessed at even with more comprehensive information.
 
  Here's why:
 
 
  http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why
  -
  we
  -dont-have-a-definitive-answer/
 
  Also, at a guess, your indexing speed is so slow due to data 
  acquisition; I rather doubt you're being limited by raw Solr indexing.
  If you're using SolrJ, try commenting out the
  server.add() bit and running again. My guess is that your indexing 
  speed will be almost unchanged, in which case it's the data 
  acquisition process is where you should concentrate efforts. As a 
  comparison, I can index 11M Wikipedia docs on my

Re: Fwd: Search Engine Framework decision

2014-01-26 Thread Ahmet Arslan



Rashmi,

As far as I know Nutch is a web crawler. I don't think it can crawl documents 
from Microsoft Share Point. ManifoldCF is a better fit in your case.

Regarding versioning if you don't have previous setups, then use latest 
versions of each.

Ahmet


On Sunday, January 26, 2014 5:24 PM, rashmi maheshwari 
maheshwari.ras...@gmail.com wrote:
Hi,

I want to creating a POC to search INTRANET along with documents uploaded
on intranet. Documents(PDF, excel, word document, text files, images,
videos) are also exists on SHAREPOINT. sharepoint has Authentication access
at module level(folder level).

My interanet website is http://myintranet/ http://sparsh/ . and
Sharepoint url is different. Documents also exist in file folders.

I have below queries:
A) Which crawler framework do I use along with Solr for this POC, Nutch
or Apache ManifoldCF?

B) Is it possible to crawl Sharepoint documents usiing Nutch? If yes, only
configuration level change would make this possible? or I have to write
code to parse and send to solr?

C) Which version of Solr+nutch+MCF should be used? because nutch version
has dependency on solr version. wold nutch 1.7 works properly with solr
4.6.0?
-- 
Rashmi
Be the change that you want to see in this world!




-- 
Rashmi
Be the change that you want to see in this world!
www.minnal.zor.org
disha.resolve.at
www.artofliving.org

Re: Solr server requirements for 100+ million documents

2014-01-26 Thread Erick Erickson

1 That's what I'd do. For incremental updates you might have to
create a trigger on the main table and insert rows into another table
that is then used to do the incremental updates. This is particularly
relevant for deletes. Consider the case where you've ingested all your
data then rows are deleted. Removing those same documents from Solr
requires either a re-indexing everything or b getting all the docs
in Solr and comparing them with the rows in the DB etc. This is
expensive. c recording the changes as above and just processing
deletes from the change table.

2 SolrJ is usually the most current. I don't know how much work
SolrNet gets. However, under the covers it's just HTTP calls so since
you have access in either to just adding HTTP parameters, you should
be able to get the full functionality out of either. I _think_ that
I'd go with whatever you're most comfortable with.

Best,
Erick

On Sun, Jan 26, 2014 at 9:54 AM, Susheel Kumar
susheel.ku...@thedigitalgroup.net wrote:
 Thank you Erick for your valuable inputs. Yes, we have to re-index data again 
  again. I'll look into possibility of tuning db access.

 On SolrJ and automating the indexing (incremental as well as one time) I want 
 to get your opinion on below two points. We will be indexing separate sets of 
 tables with similar data structure

 - Should we use SolrJ and write Java programs that can be scheduled to 
 trigger indexing on demand/schedule based.

 - Is using SolrJ a better idea even for searching than using SolrNet? As our 
 frontend is in .Net so we started using SolrNet but I am afraid down the road 
 when we scale/support SolrClod using SolrJ is better?


 Thanks
 Susheel
 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Sunday, January 26, 2014 8:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 Dumping the raw data would probably be a good idea. I guarantee you'll be 
 re-indexing the data several times as you change the schema to accommodate 
 different requirements...

 But it may also be worth spending some time figuring out why the DB access is 
 slow. Sometimes one can tune that.

 If you go the SolrJ route, you also have the possibility of setting up N 
 clients to work simultaneously, sometimes that'll help.

 FWIW,
 Erick

 On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar 
 susheel.ku...@thedigitalgroup.net wrote:
 Hi Kranti,

 Attach are the solrconfig  schema xml for review. I did run indexing with 
 just few fields (5-6 fields) in schema.xml  keeping the same db config but 
 Indexing almost still taking similar time (average 1 million records 1 hr) 
 which confirms that the bottleneck is in the data acquisition which in our 
 case is oracle database. I am thinking to not use dataimporthandler / jdbc 
 to get data from Oracle but to rather dump data somehow from oracle using 
 SQL loader and then index it. Any thoughts?

 Thnx

 -Original Message-
 From: Kranti Parisa [mailto:kranti.par...@gmail.com]
 Sent: Saturday, January 25, 2014 12:08 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 can you post the complete solrconfig.xml file and schema.xml files to review 
 all of your settings that would impact your indexing performance.

 Thanks,
 Kranti K. Parisa
 http://www.linkedin.com/in/krantiparisa



 On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar  
 susheel.ku...@thedigitalgroup.net wrote:

 Thanks, Svante. Your indexing speed using db seems to really fast.
 Can you please provide some more detail on how you are indexing db
 records. Is it thru DataImportHandler? And what database? Is that
 local db?  We are indexing around 70 fields (60 multivalued) but data
 is not populated always in all fields. The average size of document is in 
 5-10 kbs.

 -Original Message-
 From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf
 Of svante karlsson
 Sent: Friday, January 24, 2014 5:05 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 I just indexed 100 million db docs (records) with 22 fields (4
 multivalued) in 9524 sec using libcurl.
 11 million took 763 seconds so the speed drops somewhat with
 increasing dbsize.

 We write 1000 docs (just an arbitrary number) in each request from
 two threads. If you will be using solrcloud you will want more writer 
 threads.

 The hardware is a single cheap hp DL320E GEN8 V2 1P E3-1220V3 with
 one SSD and 32GB and the solr runs on ubuntu 13.10 inside a esxi virtual 
 machine.

 /svante




 2014/1/24 Susheel Kumar susheel.ku...@thedigitalgroup.net

  Thanks, Erick for the info.
 
  For indexing I agree the more time is consumed in data acquisition
  which in our case from Database.  For indexing currently we are
  using the manual process i.e. Solr dashboard Data Import but now
  looking to automate.  How do you suggest to automate the index part.

Re: Solr server requirements for 100+ million documents

2014-01-26 Thread simon

Erick's probably too modest to say so ;=) , but he wrote a great blog entry
on indexing with SolrJ -
http://searchhub.org/2012/02/14/indexing-with-solrj/ .  I took the guts of
the code in that blog and  easily customized it to write a very fast
indexer  (content from MySQL, I excised all the Tika code as I am not using
it).

You should replace StreamingUpdateSolrServer by  ConcurrentUpdateSolrServer
and experiment to find the optimal number of threads to configure.

-Simon


On Sun, Jan 26, 2014 at 11:28 AM, Erick Erickson erickerick...@gmail.comwrote:

 1 That's what I'd do. For incremental updates you might have to
 create a trigger on the main table and insert rows into another table
 that is then used to do the incremental updates. This is particularly
 relevant for deletes. Consider the case where you've ingested all your
 data then rows are deleted. Removing those same documents from Solr
 requires either a re-indexing everything or b getting all the docs
 in Solr and comparing them with the rows in the DB etc. This is
 expensive. c recording the changes as above and just processing
 deletes from the change table.

 2 SolrJ is usually the most current. I don't know how much work
 SolrNet gets. However, under the covers it's just HTTP calls so since
 you have access in either to just adding HTTP parameters, you should
 be able to get the full functionality out of either. I _think_ that
 I'd go with whatever you're most comfortable with.

 Best,
 Erick

 On Sun, Jan 26, 2014 at 9:54 AM, Susheel Kumar
 susheel.ku...@thedigitalgroup.net wrote:
  Thank you Erick for your valuable inputs. Yes, we have to re-index data
 again  again. I'll look into possibility of tuning db access.
 
  On SolrJ and automating the indexing (incremental as well as one time) I
 want to get your opinion on below two points. We will be indexing separate
 sets of tables with similar data structure
 
  - Should we use SolrJ and write Java programs that can be scheduled to
 trigger indexing on demand/schedule based.
 
  - Is using SolrJ a better idea even for searching than using SolrNet? As
 our frontend is in .Net so we started using SolrNet but I am afraid down
 the road when we scale/support SolrClod using SolrJ is better?
 
 
  Thanks
  Susheel
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Sunday, January 26, 2014 8:37 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr server requirements for 100+ million documents
 
  Dumping the raw data would probably be a good idea. I guarantee you'll
 be re-indexing the data several times as you change the schema to
 accommodate different requirements...
 
  But it may also be worth spending some time figuring out why the DB
 access is slow. Sometimes one can tune that.
 
  If you go the SolrJ route, you also have the possibility of setting up N
 clients to work simultaneously, sometimes that'll help.
 
  FWIW,
  Erick
 
  On Sat, Jan 25, 2014 at 11:06 PM, Susheel Kumar 
 susheel.ku...@thedigitalgroup.net wrote:
  Hi Kranti,
 
  Attach are the solrconfig  schema xml for review. I did run indexing
 with just few fields (5-6 fields) in schema.xml  keeping the same db
 config but Indexing almost still taking similar time (average 1 million
 records 1 hr) which confirms that the bottleneck is in the data acquisition
 which in our case is oracle database. I am thinking to not use
 dataimporthandler / jdbc to get data from Oracle but to rather dump data
 somehow from oracle using SQL loader and then index it. Any thoughts?
 
  Thnx
 
  -Original Message-
  From: Kranti Parisa [mailto:kranti.par...@gmail.com]
  Sent: Saturday, January 25, 2014 12:08 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr server requirements for 100+ million documents
 
  can you post the complete solrconfig.xml file and schema.xml files to
 review all of your settings that would impact your indexing performance.
 
  Thanks,
  Kranti K. Parisa
  http://www.linkedin.com/in/krantiparisa
 
 
 
  On Sat, Jan 25, 2014 at 12:56 AM, Susheel Kumar 
 susheel.ku...@thedigitalgroup.net wrote:
 
  Thanks, Svante. Your indexing speed using db seems to really fast.
  Can you please provide some more detail on how you are indexing db
  records. Is it thru DataImportHandler? And what database? Is that
  local db?  We are indexing around 70 fields (60 multivalued) but data
  is not populated always in all fields. The average size of document is
 in 5-10 kbs.
 
  -Original Message-
  From: saka.csi...@gmail.com [mailto:saka.csi...@gmail.com] On Behalf
  Of svante karlsson
  Sent: Friday, January 24, 2014 5:05 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr server requirements for 100+ million documents
 
  I just indexed 100 million db docs (records) with 22 fields (4
  multivalued) in 9524 sec using libcurl.
  11 million took 763 seconds so the speed drops somewhat with
  increasing dbsize.
 
  We write 1000 docs (just an arbitrary number) in each

Re: How to handle multiple sub second updates to same SOLR Document

2014-01-26 Thread Elisabeth Benoit

yutz

Envoyé de mon iPhoneippj

Le 26 janv. 2014 à 06:13, Shalin Shekhar Mangar shalinman...@gmail.com a 
écrit :

 There is no timestamp versioning as such in Solr but there is a new
 document based versioning which will allow you to specify your own
 (externally assigned) versions.
 
 See the Document Centric Versioning Constraints section at
 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
 
 Sub-second soft auto commit can be expensive but it is hard to say if
 it will be too expensive for your use-case. You must benchmark it
 yourself.
 
 On Sat, Jan 25, 2014 at 11:51 PM, christopher palm cpa...@gmail.com wrote:
 I have a scenario where the same SOLR document is being updated several
 times within a few ms of each other due to how the source system is sending
 in field updates on the document.
 
 The problem I am trying to solve is that the order of these updates isn’t
 guaranteed once the multi threaded SOLRJ client starts sending them to
 SOLR, and older updates are overlaying the newer updates on the same
 document.
 
 I would like to use a timestamp versioning so that the older document
 change won’t be sent into SOLR, but I didn’t see any automated way of doing
 this based on the document timestamp.
 
 Is there a good way to handle this scenario in SOLR 4.6?
 
 It seems that we would have to be soft auto committing with a  subsecond
 level as well, is that even possible?
 
 Thanks,
 
 Chris
 
 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.

Tie breakers when sorting equal items

2014-01-26 Thread Scott Smith

I promised to ask this on the forum just to confirm what I assume is true.

Suppose you're returning results using a sort order based on some field (so, 
not relevancy). For example, suppose it's a date field which indicates when the 
document was loaded into the solr index.   Suppose two items have exactly the 
same date/time in the field.  Would solr return the two items in the order in 
which they were inserted.  I would assume that the answer is not necessarily.

I know that you can have secondary sort fields if something exists that would 
provide the desired functionality.  I know that I could set up some kind of 
numbering scheme that would provide the same result (the customer doesn't want 
to pay for that).

So, I'm really just asking if Solr has any guarantees that when you sort on a 
field and two items have the same value, they will be sorted in the order they 
were inserted into the index.  Again, I assume the answer is no, but I said I 
would ask.

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Dileepa Jayakody

Hi all,

Any ideas on how to run a reindex update process for all the imported
documents from a /dataimport query?
Appreciate your help.


Thanks,
Dileepa


On Thu, Jan 23, 2014 at 12:21 PM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:

 Hi All,

 I did some research on this and found some alternatives useful to my
 usecase. Please give your ideas.

 Can I update all documents indexed after a /dataimport query using the
 last_indexed_time in dataimport.properties?
 If so can anyone please give me some pointers?
 What I currently have in mind is something like below;

 1. Store the indexing timestamp of the document as a field
 eg: field name=timestamp type=date indexed=true stored=true 
 default=NOW
 multiValued=false/

 2. Read the last_index_time from the dataimport.properties

 3. Query all document id's indexed after the last_index_time and send them
 through the Stanbol update processor.

 But I have a question here;
 Does the last_index_time refer to when the dataimport is
 started(onImportStart) or when the dataimport is finished (onImportEnd)?
 If it's onImportEnd timestamp, them this solution won't work because the
 timestamp indexed in the document field will be : onImportStart
 doc-index-timestamp  onImportEnd.


 Another alternative I can think of is trigger an update chain via a
 EventListener configured to run after a dataimport is processed
 (onImportEnd).
 In this case can the context in DIH give the list of document ids
 processed in the /dataimport request? If so I can send those doc ids with
 an /update query to run the Stanbol update process.

 Please give me your ideas and suggestions.

 Thanks,
 Dileepa




 On Wed, Jan 22, 2014 at 6:14 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:

 Hi All,

 I have a Solr requirement to send all the documents imported from a
 /dataimport query to go through another update chain as a separate
 background process.

 Currently I have configured my custom update chain in the /dataimport
 handler itself. But since my custom update process need to connect to an
 external enhancement engine (Apache Stanbol) to enhance the documents with
 some NLP fields, it has a negative impact on /dataimport process.
 The solution will be to have a separate update process running to enhance
 the content of the documents imported from /dataimport.

 Currently I have configured my custom Stanbol Processor as below in my
 /dataimport handler.

 requestHandler name=/dataimport class=solr.DataImportHandler
 lst name=defaults
  str name=configdata-config.xml/str
 str name=update.chainstanbolInterceptor/str
  /lst
/requestHandler

 updateRequestProcessorChain name=stanbolInterceptor
  processor
 class=com.solr.stanbol.processor.StanbolContentProcessorFactory/
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain


 What I need now is to separate the 2 processes of dataimport and
 stanbol-enhancement.
 So this is like runing a separate re-indexing process periodically over
 the documents imported from /dataimport for Stanbol fields.

 The question is how to trigger my Stanbol update process to the documents
 imported from /dataimport?
 In Solr to trigger /update query we need to know the id and the fields of
 the document to be updated. In my case I need to run all the documents
 imported from the previous /dataimport process through a stanbol
 update.chain.

 Is there a way to keep track of the documents ids imported from
 /dataimport?
 Any advice or pointers will be really helpful.

 Thanks,
 Dileepa

What is the last_index_time in dataimport.properties?

2014-01-26 Thread Dileepa Jayakody

Hi All,

Can I please know what timestamp in the dataimport process is reordered as
the last_index_time in dataimport.properties?

Is it the time that the last dataimport process started ?
OR
Is it the time that the last dataimport process finished?


Thanks,
Dileepa

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Ahmet Arslan

Hi Dileepa,

It is the time that the last dataimport process started. So it is safe to use 
it when considering updated documenets during the import.

Ahmet



On Sunday, January 26, 2014 9:10 PM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:
Hi All,

Can I please know what timestamp in the dataimport process is reordered as
the last_index_time in dataimport.properties?

Is it the time that the last dataimport process started ?
OR
Is it the time that the last dataimport process finished?


Thanks,
Dileepa

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Dileepa Jayakody

Hi Ahmet,

Thanks a lot.
It means I can use the last_index_time to query documents indexed during
the last dataimport request?
I need to run a subsequent update process to all documents imported from a
dataimport.

Thanks,
Dileepa


On Mon, Jan 27, 2014 at 1:33 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Dileepa,

 It is the time that the last dataimport process started. So it is safe to
 use it when considering updated documenets during the import.

 Ahmet



 On Sunday, January 26, 2014 9:10 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:
 Hi All,

 Can I please know what timestamp in the dataimport process is reordered as
 the last_index_time in dataimport.properties?

 Is it the time that the last dataimport process started ?
 OR
 Is it the time that the last dataimport process finished?


 Thanks,
 Dileepa

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Ahmet Arslan

Hi,

last_index_time traditionally is used to query Database. But it seems that you 
want to query solr, right?




On Sunday, January 26, 2014 11:15 PM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:
Hi Ahmet,

Thanks a lot.
It means I can use the last_index_time to query documents indexed during
the last dataimport request?
I need to run a subsequent update process to all documents imported from a
dataimport.

Thanks,
Dileepa



On Mon, Jan 27, 2014 at 1:33 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Dileepa,

 It is the time that the last dataimport process started. So it is safe to
 use it when considering updated documenets during the import.

 Ahmet



 On Sunday, January 26, 2014 9:10 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:
 Hi All,

 Can I please know what timestamp in the dataimport process is reordered as
 the last_index_time in dataimport.properties?

 Is it the time that the last dataimport process started ?
 OR
 Is it the time that the last dataimport process finished?


 Thanks,
 Dileepa

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Ahmet Arslan

Hi,

Here is what I understand from your Question.

You have a custom update processor that runs with DIH. But it is slow. You want 
to run that text enhancement component after DIH. How would this help to speed 
up things?

In this approach you will read/query/search already indexed and committed solr 
documents and run text enhancement thing on them. Probably this process will 
add new additional fields. And then you will update these solr documents?

Did I understand your use case correctly?





On Sunday, January 26, 2014 8:43 PM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:
Hi all,

Any ideas on how to run a reindex update process for all the imported
documents from a /dataimport query?
Appreciate your help.


Thanks,
Dileepa



On Thu, Jan 23, 2014 at 12:21 PM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:

 Hi All,

 I did some research on this and found some alternatives useful to my
 usecase. Please give your ideas.

 Can I update all documents indexed after a /dataimport query using the
 last_indexed_time in dataimport.properties?
 If so can anyone please give me some pointers?
 What I currently have in mind is something like below;

 1. Store the indexing timestamp of the document as a field
 eg: field name=timestamp type=date indexed=true stored=true 
 default=NOW
 multiValued=false/

 2. Read the last_index_time from the dataimport.properties

 3. Query all document id's indexed after the last_index_time and send them
 through the Stanbol update processor.

 But I have a question here;
 Does the last_index_time refer to when the dataimport is
 started(onImportStart) or when the dataimport is finished (onImportEnd)?
 If it's onImportEnd timestamp, them this solution won't work because the
 timestamp indexed in the document field will be : onImportStart
 doc-index-timestamp  onImportEnd.


 Another alternative I can think of is trigger an update chain via a
 EventListener configured to run after a dataimport is processed
 (onImportEnd).
 In this case can the context in DIH give the list of document ids
 processed in the /dataimport request? If so I can send those doc ids with
 an /update query to run the Stanbol update process.

 Please give me your ideas and suggestions.

 Thanks,
 Dileepa




 On Wed, Jan 22, 2014 at 6:14 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:

 Hi All,

 I have a Solr requirement to send all the documents imported from a
 /dataimport query to go through another update chain as a separate
 background process.

 Currently I have configured my custom update chain in the /dataimport
 handler itself. But since my custom update process need to connect to an
 external enhancement engine (Apache Stanbol) to enhance the documents with
 some NLP fields, it has a negative impact on /dataimport process.
 The solution will be to have a separate update process running to enhance
 the content of the documents imported from /dataimport.

 Currently I have configured my custom Stanbol Processor as below in my
 /dataimport handler.

 requestHandler name=/dataimport class=solr.DataImportHandler
 lst name=defaults
  str name=configdata-config.xml/str
 str name=update.chainstanbolInterceptor/str
  /lst
    /requestHandler

 updateRequestProcessorChain name=stanbolInterceptor
  processor
 class=com.solr.stanbol.processor.StanbolContentProcessorFactory/
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain


 What I need now is to separate the 2 processes of dataimport and
 stanbol-enhancement.
 So this is like runing a separate re-indexing process periodically over
 the documents imported from /dataimport for Stanbol fields.

 The question is how to trigger my Stanbol update process to the documents
 imported from /dataimport?
 In Solr to trigger /update query we need to know the id and the fields of
 the document to be updated. In my case I need to run all the documents
 imported from the previous /dataimport process through a stanbol
 update.chain.

 Is there a way to keep track of the documents ids imported from
 /dataimport?
 Any advice or pointers will be really helpful.

 Thanks,
 Dileepa

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Dileepa Jayakody

Hi Ahmet,



On Mon, Jan 27, 2014 at 3:26 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 Here is what I understand from your Question.

 You have a custom update processor that runs with DIH. But it is slow. You
 want to run that text enhancement component after DIH. How would this help
 to speed up things?


 In this approach you will read/query/search already indexed and committed
 solr documents and run text enhancement thing on them. Probably this
 process will add new additional fields. And then you will update these solr
 documents?

 Did I understand your use case correctly?


Yes, that is exactly what I want to achieve.
I want to separate out the enhancement process from the dataimport process.
The dataimport process will be invoked by a client when new data is
added/updated to the mysql database. Therefore the dataimport process with
mandatory fields of the documents should be indexed as soon as possible.
Mandatory fields are mapped to the data table columns in the
data-config.xml and the normal /dataimport process doesn't take much time.
The enhancements are done in my custom processor by sending the content
field of the document to an external Stanbol[1] server to detect NLP
enhancements. Then new NLP fields are added to the document (detected
persons, organizations, places in the content) in the custom update
processor and if this is executed during the dataimport process, it takes a
lot of time.

The NLP fields are not mandatory for the primary usage of the application
which is to query documents with mandatory fields. The NLP fields are
required for custom queries for Person, Organization entities. Therefore
the NLP update process should be run as a background process detached from
the primary /dataimport process. It should not slow down the existing
/dataimport process.

That's why I am looking for the best way to achieve my objective. I want to
implement a way to separately update the imported documents from
/dataimport  to detect NLP enhancements. Currently I'm having the idea of
adopting a timestamp based approach to trigger a /update query to all
documents imported after the last_index_time in dataimport.prop and update
them with NLP fields.

Hope my requirement is clear :). Appreciate your suggestions.

[1] http://stanbol.apache.org/





 On Sunday, January 26, 2014 8:43 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:
 Hi all,

 Any ideas on how to run a reindex update process for all the imported
 documents from a /dataimport query?
 Appreciate your help.


 Thanks,
 Dileepa



 On Thu, Jan 23, 2014 at 12:21 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:

  Hi All,
 
  I did some research on this and found some alternatives useful to my
  usecase. Please give your ideas.
 
  Can I update all documents indexed after a /dataimport query using the
  last_indexed_time in dataimport.properties?
  If so can anyone please give me some pointers?
  What I currently have in mind is something like below;
 
  1. Store the indexing timestamp of the document as a field
  eg: field name=timestamp type=date indexed=true stored=true
 default=NOW
  multiValued=false/
 
  2. Read the last_index_time from the dataimport.properties
 
  3. Query all document id's indexed after the last_index_time and send
 them
  through the Stanbol update processor.
 
  But I have a question here;
  Does the last_index_time refer to when the dataimport is
  started(onImportStart) or when the dataimport is finished (onImportEnd)?
  If it's onImportEnd timestamp, them this solution won't work because the
  timestamp indexed in the document field will be : onImportStart
  doc-index-timestamp  onImportEnd.
 
 
  Another alternative I can think of is trigger an update chain via a
  EventListener configured to run after a dataimport is processed
  (onImportEnd).
  In this case can the context in DIH give the list of document ids
  processed in the /dataimport request? If so I can send those doc ids with
  an /update query to run the Stanbol update process.
 
  Please give me your ideas and suggestions.
 
  Thanks,
  Dileepa
 
 
 
 
  On Wed, Jan 22, 2014 at 6:14 PM, Dileepa Jayakody 
  dileepajayak...@gmail.com wrote:
 
  Hi All,
 
  I have a Solr requirement to send all the documents imported from a
  /dataimport query to go through another update chain as a separate
  background process.
 
  Currently I have configured my custom update chain in the /dataimport
  handler itself. But since my custom update process need to connect to an
  external enhancement engine (Apache Stanbol) to enhance the documents
 with
  some NLP fields, it has a negative impact on /dataimport process.
  The solution will be to have a separate update process running to
 enhance
  the content of the documents imported from /dataimport.
 
  Currently I have configured my custom Stanbol Processor as below in my
  /dataimport handler.
 
  requestHandler name=/dataimport class=solr.DataImportHandler
  lst name=defaults
   str

Re: What is the last_index_time in dataimport.properties?

2014-01-26 Thread Dileepa Jayakody

Yes Ahmet.
I want to use the last_index_time to find the documents imported in the
last /dataimport process and send them through a update process. I have
explained this requirement in my other thread.

Thanks,
Dileepa


On Mon, Jan 27, 2014 at 3:23 AM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 last_index_time traditionally is used to query Database. But it seems that
 you want to query solr, right?




 On Sunday, January 26, 2014 11:15 PM, Dileepa Jayakody 
 dileepajayak...@gmail.com wrote:
 Hi Ahmet,

 Thanks a lot.
 It means I can use the last_index_time to query documents indexed during
 the last dataimport request?
 I need to run a subsequent update process to all documents imported from a
 dataimport.

 Thanks,
 Dileepa



 On Mon, Jan 27, 2014 at 1:33 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi Dileepa,
 
  It is the time that the last dataimport process started. So it is safe to
  use it when considering updated documenets during the import.
 
  Ahmet
 
 
 
  On Sunday, January 26, 2014 9:10 PM, Dileepa Jayakody 
  dileepajayak...@gmail.com wrote:
  Hi All,
 
  Can I please know what timestamp in the dataimport process is reordered
 as
  the last_index_time in dataimport.properties?
 
  Is it the time that the last dataimport process started ?
  OR
  Is it the time that the last dataimport process finished?
 
 
  Thanks,
  Dileepa

Complication - can block joins help?

2014-01-26 Thread William Bell

OK,

In order to do boosting, we often will create a dynamic field in SOLR. For
example:

A Professional hire out for work, I want to boost those who do
woodworking.

George Smith - builds chairs, and builds desks. He builds the most desks in
the country (350 a year). And his closest competitor does 200 a year.

id (integer) = 1
name (string) =George Smith
work multiValued field = chairs, desks
num_desk (dynamic field num*) = 500

Then I would do something like:
q=num_desk^5.0

Is there a way to do this without a dynamic field?

I thought about a field: desk|500 (use bar delimiter). But couldn't see how
to have the value indexed to easily to a boost for those who do the most.

If you think of all the type of work, this could be 50,000 dynamic fields.
Probably a performance hog.





Dr. Smith
Angioplasty
Performs 70 of these a year



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: How to run a subsequent update query to documents indexed from a dataimport query

2014-01-26 Thread Varun Thacker

Hi Dileepa,

If I understand correctly this is what happens in your system correctly :

1. DIH Sends data to Solr
2. You have written a custom update processor (
http://wiki.apache.org/solr/UpdateRequestProcessor) which the asks your
Stanbol server for meta data, adds it to the document and then indexes it.

Its the part where you query the Stanbol server and wait for the response
which takes time and you want to reduce this.

According to me instead of waiting for your response from the Stanbol
server and then indexing it, You could send the required field data from
the doc to your Stanbol server and continue. Once Stanbol as enriched the
document, you re-index the document and update it with the meta-data.

This method makes you re-index the document but the changes from your
client would be visible faster.

Alternately you could do the same thing at the DIH level by writing a
customer Transformer (
http://wiki.apache.org/solr/DataImportHandler#Writing_Custom_Transformers)


On Mon, Jan 27, 2014 at 10:44 AM, Dileepa Jayakody 
dileepajayak...@gmail.com wrote:

 Hi Ahmet,



 On Mon, Jan 27, 2014 at 3:26 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  Here is what I understand from your Question.
 
  You have a custom update processor that runs with DIH. But it is slow.
 You
  want to run that text enhancement component after DIH. How would this
 help
  to speed up things?


  In this approach you will read/query/search already indexed and committed
  solr documents and run text enhancement thing on them. Probably this
  process will add new additional fields. And then you will update these
 solr
  documents?
 
  Did I understand your use case correctly?
 

 Yes, that is exactly what I want to achieve.
 I want to separate out the enhancement process from the dataimport process.
 The dataimport process will be invoked by a client when new data is
 added/updated to the mysql database. Therefore the dataimport process with
 mandatory fields of the documents should be indexed as soon as possible.
 Mandatory fields are mapped to the data table columns in the
 data-config.xml and the normal /dataimport process doesn't take much time.
 The enhancements are done in my custom processor by sending the content
 field of the document to an external Stanbol[1] server to detect NLP
 enhancements. Then new NLP fields are added to the document (detected
 persons, organizations, places in the content) in the custom update
 processor and if this is executed during the dataimport process, it takes a
 lot of time.

 The NLP fields are not mandatory for the primary usage of the application
 which is to query documents with mandatory fields. The NLP fields are
 required for custom queries for Person, Organization entities. Therefore
 the NLP update process should be run as a background process detached from
 the primary /dataimport process. It should not slow down the existing
 /dataimport process.

 That's why I am looking for the best way to achieve my objective. I want to
 implement a way to separately update the imported documents from
 /dataimport  to detect NLP enhancements. Currently I'm having the idea of
 adopting a timestamp based approach to trigger a /update query to all
 documents imported after the last_index_time in dataimport.prop and update
 them with NLP fields.

 Hope my requirement is clear :). Appreciate your suggestions.

 [1] http://stanbol.apache.org/

 
 
 
 
  On Sunday, January 26, 2014 8:43 PM, Dileepa Jayakody 
  dileepajayak...@gmail.com wrote:
  Hi all,
 
  Any ideas on how to run a reindex update process for all the imported
  documents from a /dataimport query?
  Appreciate your help.
 
 
  Thanks,
  Dileepa
 
 
 
  On Thu, Jan 23, 2014 at 12:21 PM, Dileepa Jayakody 
  dileepajayak...@gmail.com wrote:
 
   Hi All,
  
   I did some research on this and found some alternatives useful to my
   usecase. Please give your ideas.
  
   Can I update all documents indexed after a /dataimport query using the
   last_indexed_time in dataimport.properties?
   If so can anyone please give me some pointers?
   What I currently have in mind is something like below;
  
   1. Store the indexing timestamp of the document as a field
   eg: field name=timestamp type=date indexed=true stored=true
  default=NOW
   multiValued=false/
  
   2. Read the last_index_time from the dataimport.properties
  
   3. Query all document id's indexed after the last_index_time and send
  them
   through the Stanbol update processor.
  
   But I have a question here;
   Does the last_index_time refer to when the dataimport is
   started(onImportStart) or when the dataimport is finished
 (onImportEnd)?
   If it's onImportEnd timestamp, them this solution won't work because
 the
   timestamp indexed in the document field will be : onImportStart
   doc-index-timestamp  onImportEnd.
  
  
   Another alternative I can think of is trigger an update chain via a
   EventListener configured to run after a dataimport is processed

Re: Complication - can block joins help?

2014-01-26 Thread William Bell

Is there an example for using payloads for 4.6?

Without any custom code for this?


On Sun, Jan 26, 2014 at 10:30 PM, William Bell billnb...@gmail.com wrote:

 OK,

 In order to do boosting, we often will create a dynamic field in SOLR. For
 example:

 A Professional hire out for work, I want to boost those who do
 woodworking.

 George Smith - builds chairs, and builds desks. He builds the most desks
 in the country (350 a year). And his closest competitor does 200 a year.

 id (integer) = 1
 name (string) =George Smith
 work multiValued field = chairs, desks
 num_desk (dynamic field num*) = 500

 Then I would do something like:
 q=num_desk^5.0

 Is there a way to do this without a dynamic field?

 I thought about a field: desk|500 (use bar delimiter). But couldn't see
 how to have the value indexed to easily to a boost for those who do the
 most.

 If you think of all the type of work, this could be 50,000 dynamic fields.
 Probably a performance hog.





 Dr. Smith
 Angioplasty
 Performs 70 of these a year



 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: What is the right way to bring a failed SolrCloud node back online?

Re: Solr server requirements for 100+ million documents

Re: What is the right way to bring a failed SolrCloud node back online?

Fwd: Search Engine Framework decision

RE: Solr server requirements for 100+ million documents

Re: Fwd: Search Engine Framework decision

Re: Solr server requirements for 100+ million documents

Re: Solr server requirements for 100+ million documents

Re: How to handle multiple sub second updates to same SOLR Document

Tie breakers when sorting equal items

Re: How to run a subsequent update query to documents indexed from a dataimport query

What is the last_index_time in dataimport.properties?

Re: What is the last_index_time in dataimport.properties?

Re: What is the last_index_time in dataimport.properties?

Re: What is the last_index_time in dataimport.properties?

Re: How to run a subsequent update query to documents indexed from a dataimport query

Re: How to run a subsequent update query to documents indexed from a dataimport query

Re: What is the last_index_time in dataimport.properties?

Complication - can block joins help?

Re: How to run a subsequent update query to documents indexed from a dataimport query

Re: Complication - can block joins help?

21 matches

Site Navigation

Mail list logo

Footer information